Portkey provides a robust and secure platform to observe, govern, and manage your locally or privately hosted custom models using vLLM.

Here’s a list of all model architectures supported on vLLM.

Integrating Custom Models with Portkey SDK

1

Expose your vLLM Server

Expose your vLLM server by using a tunneling service like ngrok or any other way you prefer. You can skip this step if you’re self-hosting the Gateway.

ngrok http 11434 --host-header="localhost:8080"
2

Install the Portkey SDK

npm install --save portkey-ai
3

Initialize Portkey with vLLM custom URL

  1. Pass your publicly-exposed vLLM server URL to Portkey with customHost (by default, vLLM is on http://localhost:8000/v1)
  2. Set target provider as openai since the server follows OpenAI API schema.
import Portkey from 'portkey-ai'

const portkey = new Portkey({
    apiKey: "PORTKEY_API_KEY",
    provider: "openai",
    customHost: "https://7cc4-3-235-157-146.ngrok-free.app" // Your vLLM ngrok URL
    Authorization: "AUTH_KEY", // If you need to pass auth
})

More on custom_host here.

4

Invoke Chat Completions

Use the Portkey SDK to invoke chat completions from your model, just as you would with any other provider:

const chatCompletion = await portkey.chat.completions.create({
    messages: [{ role: 'user', content: 'Say this is a test' }]
});

console.log(chatCompletion.choices);

Next Steps

Explore the complete list of features supported in the SDK:

SDK


You’ll find more information in the relevant sections:

  1. Add metadata to your requests
  2. Add gateway configs to your requests
  3. Tracing requests
  4. Setup a fallback from OpenAI to your local LLM