Integrate vLLM-hosted custom models with Portkey and take them to production
Portkey provides a robust and secure platform to observe, govern, and manage your locally or privately hosted custom models using vLLM.
Here’s a list of all model architectures supported on vLLM.
Integrating Custom Models with Portkey SDK
Expose your vLLM Server
Expose your vLLM server by using a tunneling service like ngrok or any other way you prefer. You can skip this step if you’re self-hosting the Gateway.
Install the Portkey SDK
Initialize Portkey with vLLM custom URL
- Pass your publicly-exposed vLLM server URL to Portkey with
customHost
(by default, vLLM is onhttp://localhost:8000/v1
) - Set target
provider
asopenai
since the server follows OpenAI API schema.
More on custom_host
here.
Invoke Chat Completions
Use the Portkey SDK to invoke chat completions from your model, just as you would with any other provider:
Using Virtual Keys
Virtual Keys serve as Portkey’s unified authentication system for all LLM interactions, simplifying the use of multiple providers and Portkey features within your application. For self-hosted LLMs, you can configure custom authentication requirements including authorization keys, bearer tokens, or any other headers needed to access your model:
- Navigate to Virtual Keys in your Portkey dashboard
- Click “Add Key” and enable the “Local/Privately hosted provider” toggle
- Configure your deployment:
- Select the matching provider API specification (typically
OpenAI
) - Enter your model’s base URL in the
Custom Host
field - Add required authentication headers and their values
- Select the matching provider API specification (typically
- Click “Create” to generate your virtual key
You can now use this virtual key in your requests:
For more information about managing self-hosted LLMs with Portkey, see Bring Your Own LLM.
Next Steps
Explore the complete list of features supported in the SDK:
SDK
You’ll find more information in the relevant sections: