Portkey provides a robust and secure platform to observe, govern, and manage your locally or privately hosted custom models using vLLM.
Here’s a list of all model architectures supported on vLLM.
Integrating Custom Models with Portkey SDK
Expose your vLLM Server
Expose your vLLM server by using a tunneling service like ngrok or any other way you prefer. You can skip this step if you’re self-hosting the Gateway.
ngrok http 11434 --host-header="localhost:8080"
Install the Portkey SDK
npm install --save portkey-ai
npm install --save portkey-ai
Initialize Portkey with vLLM custom URL
- Pass your publicly-exposed vLLM server URL to Portkey with
customHost
(by default, vLLM is on http://localhost:8000/v1
)
- Set target
provider
as openai
since the server follows OpenAI API schema.
import Portkey from 'portkey-ai'
const portkey = new Portkey({
apiKey: "PORTKEY_API_KEY",
provider: "openai",
customHost: "https://7cc4-3-235-157-146.ngrok-free.app" // Your vLLM ngrok URL
Authorization: "AUTH_KEY", // If you need to pass auth
})
import Portkey from 'portkey-ai'
const portkey = new Portkey({
apiKey: "PORTKEY_API_KEY",
provider: "openai",
customHost: "https://7cc4-3-235-157-146.ngrok-free.app" // Your vLLM ngrok URL
Authorization: "AUTH_KEY", // If you need to pass auth
})
from portkey_ai import Portkey
portkey = Portkey(
api_key="PORTKEY_API_KEY",
provider="openai",
custom_host="https://7cc4-3-235-157-146.ngrok-free.app" # Your vLLM ngrok URL
Authorization="AUTH_KEY", # If you need to pass auth
)
More on custom_host
here.
Invoke Chat Completions
Use the Portkey SDK to invoke chat completions from your model, just as you would with any other provider:
const chatCompletion = await portkey.chat.completions.create({
messages: [{ role: 'user', content: 'Say this is a test' }]
});
console.log(chatCompletion.choices);
const chatCompletion = await portkey.chat.completions.create({
messages: [{ role: 'user', content: 'Say this is a test' }]
});
console.log(chatCompletion.choices);
completion = portkey.chat.completions.create(
messages= [{ "role": 'user', "content": 'Say this is a test' }]
)
print(completion)
Virtual Keys serve as Portkey’s unified authentication system for all LLM interactions, simplifying the use of multiple providers and Portkey features within your application. For self-hosted LLMs, you can configure custom authentication requirements including authorization keys, bearer tokens, or any other headers needed to access your model:
- Navigate to Virtual Keys in your Portkey dashboard
- Click “Add Key” and enable the “Local/Privately hosted provider” toggle
- Configure your deployment:
- Select the matching provider API specification (typically
OpenAI
)
- Enter your model’s base URL in the
Custom Host
field
- Add required authentication headers and their values
- Click “Create” to generate your virtual key
You can now use this virtual key in your requests:
const portkey = new Portkey({
apiKey: "PORTKEY_API_KEY",
virtualKey: "YOUR_SELF_HOSTED_LLM_VIRTUAL_KEY"
async function main() {
const response = await client.chat.completions.create({
messages: [{ role: "user", content: "Bob the builder.." }],
model: "your-self-hosted-model-name",
});
console.log(response.choices[0].message.content);
})
const portkey = new Portkey({
apiKey: "PORTKEY_API_KEY",
virtualKey: "YOUR_SELF_HOSTED_LLM_VIRTUAL_KEY"
async function main() {
const response = await client.chat.completions.create({
messages: [{ role: "user", content: "Bob the builder.." }],
model: "your-self-hosted-model-name",
});
console.log(response.choices[0].message.content);
})
portkey = Portkey(
api_key="PORTKEY_API_KEY",
virtual_key="YOUR_SELF_HOSTED_LLM_VIRTUAL_KEY"
)
response = portkey.chat.completions.create(
model="your-self-hosted-model-name",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}
]
print(response)
)
For more information about managing self-hosted LLMs with Portkey, see Bring Your Own LLM.
Next Steps
Explore the complete list of features supported in the SDK:
You’ll find more information in the relevant sections:
- Add metadata to your requests
- Add gateway configs to your requests
- Tracing requests
- Setup a fallback from OpenAI to your local LLM