In the Guardrail configuration UI, you'll need to provide:
| Field | Description | Type |
| :-------------- | :--------------------------------------- | :------------ |
| **Webhook URL** | Your webhook's endpoint URL | `string` |
| **Headers** | Headers to include with webhook requests | `JSON` |
| **Timeout** | Maximum wait time for webhook response | `number` (ms) |
#### Webhook URL
This should be a publicly accessible URL where your webhook is hosted.
Based on your access level, you might see the relevant permissions on the API key modal - tick the ones you'd like, name your API key, and save it.
## 2. Integrate Portkey
Portkey offers a variety of integration options, including SDKs, REST APIs, and native connections with platforms like OpenAI, Langchain, and LlamaIndex, among others.
### Through the OpenAI SDK
If you're using the **OpenAI SDK**, import the Portkey SDK and configure it within your OpenAI client object:
## Understanding API Key Types
**Note**: Service API keys provide system-level access and are distinct from user API keys which grant individual user access.
* **Service API Keys**: Used for automated processes, integrations, and system-level operations. These keys typically have broader permissions and are not tied to individual users.
* **User API Keys**: Associated with specific users and provide individualized access to Portkey resources. These keys are generally more limited in scope and tied to the permissions of the specific user.
## Related Features
## Logs vs. Logs Metadata
* **Logs**: Complete log entries including request and response payloads
* **Logs Metadata**: Information such as timestamps, model used, tokens consumed, and other metrics without the actual content
## Related Features
## Understanding Virtual Keys
Virtual keys in Portkey securely store provider credentials and enable:
* Centralized management of AI provider keys
* Abstraction of actual provider keys from end users
* Definition of routing rules, fallbacks, and other advanced features
* Application of usage limits and tracking across providers
By controlling who can view and manage these virtual keys, organizations can maintain security while enabling appropriate access for different team roles.
## Related Features
### Logging Modes
#### Full Logging
When enabled, Portkey stores:
* Complete request payloads
* Full response content
* All associated metrics and metadata
This mode provides comprehensive observability for debugging, monitoring, and optimization purposes.
#### Metrics Only (Privacy Mode)
When enabled, Portkey only tracks:
* Usage statistics (tokens, latency, costs)
* Request metadata
* Error information (without sensitive content)
This mode ensures privacy compliance by not storing any request or response content, while still maintaining essential operational metrics.
### Workspace-Level Control
The **"Allow respective workspace managers to toggle Request Logging"** option determines whether workspace managers can override the organization-level settings:
* **When enabled**: Workspace managers can change logging settings for their specific workspace
* **When disabled**: All workspaces inherit and must use the organization-level logging settings
### Alert Thresholds
You can configure alert thresholds to receive notifications before reaching your full budget:
1. Enter a value in the **Alert Threshold** field
2. When usage reaches this threshold, notifications will be sent to configured recipients
3. The API key continues to function until the full budget limit is reached
### Periodic Reset Options
Budget limits can be set to automatically reset at regular intervals:
* **No Periodic Reset**: The budget limit applies until exhausted
* **Reset Weekly**: Budget limits reset every Sunday at 12 AM UTC
* **Reset Monthly**: Budget limits reset on the 1st of each month at 12 AM UTC
## Rate Limits
Rate limits control how frequently an API key can be used, helping you maintain application performance and prevent unexpected usage spikes.
### Setting Up Rate Limits
When creating a new API key or editing an existing one:
1. Toggle on **Add Rate Limit**
2. Choose your limit type:
* **Requests**: Limit based on number of API calls
* **Tokens**: Limit based on token consumption
3. Specify the limit value and time interval
### Time Intervals
Rate limits can be applied using three different time intervals:
* **Per Minute**: For granular control of high-frequency applications
* **Per Hour**: For balanced control of moderate usage
* **Per Day**: For broader usage management
When a rate limit is reached, subsequent requests are rejected until the time interval resets.
## Email Notifications
Email notifications keep relevant stakeholders informed about API key usage and when limits are approached or reached.
### Configuring Notifications
To set up email notifications for an API key with budget limits:
1. Toggle on **Email Notifications** when creating/editing an API key
2. Add recipient email addresses:
* Type an email address and click **New** or press Enter
* Add multiple recipients as needed
### Default Recipients
When limits are reached or thresholds are crossed, Portkey automatically sends notifications to:
* Organization administrators
* Organization owners
* The API key creator/owner
You can add additional recipients such as finance team members, department heads, or project managers who need visibility into AI usage.
## Availability
These features are available to Portkey Enterprise customers and select Pro users. To enable these features for your account, please contact [support@portkey.ai](mailto:support@portkey.ai) or join the [Portkey Discord](https://portkey.ai/community) community.
To learn more about the Portkey Enterprise plan, [schedule a consultation](https://portkey.sh/demo-16).
# Enforce Budget Limits on Your AI Provider
Source: https://docs.portkey.ai/docs/product/administration/enforce-budget-limits-on-your-ai-provider
# Enforcing Default Configs on API Keys
Source: https://docs.portkey.ai/docs/product/administration/enforce-default-config
Learn how to attach default configs to API keys for enforcing governance controls across your organization
## Overview
Portkey allows you to attach default configs to API keys, enabling you to enforce specific routing rules, security controls, and other governance measures across all API calls made with those keys. This feature provides a powerful way to implement organization-wide policies without requiring changes to individual application code.
## Setting Rate Limits
Rate limits control how frequently requests can be made from a workspace, helping you maintain application performance and prevent unexpected usage spikes.
To set up rate limits for a workspace:
1. Toggle on **Add Rate Limit**
2. Choose your limit type:
* **Requests**: Limit based on number of API calls
* **Tokens**: Limit based on token consumption
3. Specify the limit value in the **Set Rate Limit** field
4. Select a time interval from the dropdown (Requests/Minute, Requests/Hour, Requests/Day)
When a workspace reaches its rate limit, all subsequent requests from that workspace will be rejected until the time interval resets, regardless of which API key is used.
## Setting Budget Limits
Budget limits allow you to set maximum spending or token usage thresholds for an entire workspace, automatically preventing further usage when limits are reached.
To set up budget limits for a workspace:
1. Toggle on **Add Budget**
2. Choose your limit type:
* **Cost**: Set a maximum spend in USD
* **Tokens**: Set a maximum token usage
3. Enter the budget amount in the **Budget Limit (\$)** field
4. Optionally, set an **Alert Threshold (\$)** to receive notifications before reaching the full budget
5. Select a **Periodic Reset** option to determine when the budget refreshes
### Alert Thresholds
Alert thresholds trigger notifications when a percentage of the workspace budget has been consumed:
1. Enter a value in the **Alert Threshold (\$)** field
2. When usage reaches this threshold, notifications will be sent to configured recipients
3. The workspace continues to function until the full budget limit is reached
### Periodic Reset Options
Workspace budgets can be set to automatically reset at regular intervals:
* **No Periodic Reset**: The budget limit applies until exhausted
* **Reset Weekly**: Budget limits reset every Sunday at 12 AM UTC
* **Reset Monthly**: Budget limits reset on the 1st of each month at 12 AM UTC
> The configuration options for workspace budgets and rate limits are the same as other budget and rate limit controls in the Portkey app (for API keys and providers). If you're familiar with those, this will feel identical—just applied at the workspace level.
## Notification System
When workspace limits are approached or reached, Portkey automatically sends notifications to:
* Organization administrators and owners
## Use Cases
Workspace budget limits are particularly useful for:
* **Departmental Allocations**: Assign specific AI budgets to different departments (Marketing, Customer Support, R\&D)
* **Project Management**: Allocate resources based on project priority and requirements
* **Cost Center Tracking**: Monitor and control spending across different cost centers
* **Phased Rollouts**: Gradually increase limits as teams demonstrate value and mature their AI use cases
### Set Workspace Budget and Rate Limits using Portkey Admin API
Use the Admin API to programmatically manage workspace budgets and rate limits.
* The endpoint updates a workspace and accepts both `usage_limits` (budgets) and `rate_limits`.
* Budget limits mirror the app controls: cost- or token-based, optional `alert_threshold`, and optional `periodic_reset` of `weekly` or `monthly`.
* Rate limits support request- or token-based throttling with units `rpm` (per minute), `rph` (per hour), or `rpd` (per day).
You can also [self-host](https://github.com/Portkey-AI/gateway/blob/main/docs/installation-deployments.md) the gateway and then connect it to Portkey. Please reach out on [hello@portkey.ai](mailto:hello@portkey.ai) and we'll help you set this up!
# Automatic Retries
Source: https://docs.portkey.ai/docs/product/ai-gateway/automatic-retries
LLM APIs often have inexplicable failures. With Portkey, you can rescue a substantial number of your requests with our in-built automatic retries feature.
For each request we also calculate and show the cache response time and how much money you saved with each hit.
***
## How Cache works with Configs
You can set cache at two levels:
* **Top-level** that works across all the targets.
* **Target-level** that works when that specific target is triggered.
## Enabling Fallback on LLMs
To enable fallbacks, you can modify the [config object](/api-reference/config-object) to include the `fallback` mode.
Here's a quick example of a config to **fallback** to Anthropic's `claude-3.5-sonnet` if OpenAI's `gpt-4o` fails.
```JSON theme={"system"}
{
"strategy": {
"mode": "fallback"
},
"targets": [
{
"provider":"@openai-virtual-key",
"override_params": {
"model": "gpt-4o"
}
},
{
"provider":"@anthropic-virtual-key",
"override_params": {
"model": "claude-3.5-sonnet-20240620"
}
}
]
}
```
In this scenario, if the OpenAI model encounters an error or fails to respond, Portkey will automatically retry the request with Anthropic.
[Using Configs in your Requests](/product/ai-gateway/configs#using-configs)
## Triggering fallback on specific error codes
By default, fallback is triggered on any request that returns a **non-2xx** status code.
You can change this behaviour by setting the optional `on_status_codes` param in your fallback config and manually inputting the status codes on which fallback will be triggered.
```sh theme={"system"}
{
"strategy": {
"mode": "fallback",
"on_status_codes": [ 429 ]
},
"targets": [
{
"provider":"@openai-virtual-key"
},
{
"provider":"@azure-openai-virtual-key"
}
]
}
```
Here, fallback from OpenAI to Azure OpenAI will only be triggered when there is a `429` error code from the OpenAI request (i.e. rate limiting error)
## Tracing Fallback Requests on Portkey
Portkey logs all the requests that are sent as a part of your fallback config. This allows you to easily trace and see which targets failed and see which ones were eventually successful.
To see your fallback trace,
1. On the Logs page, first filter the logs with the specific `Config ID` where you've setup the fallback - this will show all the requests that have been sent with that config.
2. Now, trace an individual request and all the failed + successful logs for it by filtering further on `Trace ID` - this will show all the logs originating from a single request.
## Caveats and Considerations
While the Fallback on LLMs feature greatly enhances the reliability and resilience of your application, there are a few things to consider:
1. Ensure the LLMs in your fallback list are compatible with your use case. Not all LLMs offer the same capabilities.
2. Keep an eye on your usage with each LLM. Depending on your fallback list, a single request could result in multiple LLM invocations.
3. Understand that each LLM has its own latency and pricing. Falling back to a different LLM could have implications on the cost and response time.
# Files
Source: https://docs.portkey.ai/docs/product/ai-gateway/files
Upload files to Portkey and reuse the content in your requests
Portkey supports managing files in two ways:
1. **Provider Files**: Uploading and managing files directly to any provider using the unified signature
2. **Portkey Files**: Uploading files to Portkey and using them for batching/fine-tuning requests with any provider
***
## 1. Provider Files
Upload and manage files directly to providers (OpenAI, Bedrock, etc.) using Portkey's unified API signature. This approach is useful when you need provider-specific file features or want to manage files directly on the provider's platform.
### Supported Providers
* [OpenAI](/integrations/llms/openai/files)
* [Bedrock](/integrations/llms/bedrock/files)
* [Azure OpenAI](/integrations/llms/azure-openai/files)
* [Fireworks](/integrations/llms/fireworks/files)
* [Vertex AI](/integrations/llms/vertex-ai/files)
### Quick Example
```bash theme={"system"}
curl -X POST https://api.portkey.ai/v1/files \
-H "Authorization: Bearer $PORTKEY_API_KEY" \
-H "x-portkey-provider: openai" \
-F "purpose=fine-tune" \
-F "file=@training_data.jsonl"
```
***
## 2. Portkey Files
Upload files to Portkey and reuse them for [batching inference](/product/ai-gateway/batches) with any provider and [fine-tuning](/product/ai-gateway/fine-tuning) with supported providers. This approach is ideal for:
* Testing your data with different foundation models
* Performing A/B testing across multiple providers
* Batch inference with provider-agnostic file management
* Reusing the same file across multiple batch jobs
### File Requirements
* **Format**: JSONL files where each line contains a single request payload
### Uploading Files
#### Explore the AI Gateway's Multimodal capabilities below:
## Managing Functions and Tools in Prompts
Portkey's Prompt Library supports creating prompt templates with function/tool definitions, as well as letting you set the `tool choice` param. Portkey will also validate your tool definition on the fly, eliminating syntax errors.
## Supported Providers and Models
The following providers are supported for function calling with more providers getting added soon. Please raise a [request](/integrations/llms/suggest-a-new-integration) or a [PR](https://github.com/Portkey-AI/gateway/pulls) to add model or provider to the AI gateway.
| Provider | Models |
| -------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------- |
| [OpenAI](/integrations/llms/openai) | gpt-4 series of modelsgpt-3.5-turbo series of models |
| [Azure OpenAI](/integrations/llms/azure-openai) | gpt-4 series of modelsgpt-3.5-turbo series of models |
| [Anyscale](/integrations/llms/anyscale-llama2-mistral-zephyr) | mistralai/Mistral-7B-Instruct-v0.1 mistralai/Mixtral-8x7B-Instruct-v0.1 |
| [Together AI](/integrations/llms/together-ai) | mistralai/Mixtral-8x7B-Instruct-v0.1 mistralai/Mistral-7B-Instruct-v0.1 togethercomputer/CodeLlama-34b-Instruct |
| [Fireworks AI](/integrations/llms/fireworks) | firefunction-v1fw-function-call-34b-v0 |
| [Google Gemini](/integrations/llms/gemini) / [Vertex AI](/integrations/llms/vertex-ai) | gemini-1.0-progemini-1.0-pro-001gemini-1.5-pro-latest |
## Cookbook
[**Here's a detailed cookbook on function calling using Portkey.**](/guides/getting-started/function-calling)
# Image Generation
Source: https://docs.portkey.ai/docs/product/ai-gateway/multimodal-capabilities/image-generation
Portkey's AI gateway supports image generation capabilities that many foundational model providers offer.
The most common use case is that of **text-to-image** where the user sends a prompt which the image model processes and returns an image.
## Cookbook
[**Here's a detailed cookbook on image generation using Portkey**](https://github.com/Portkey-AI/portkey-cookbook/blob/main/examples/image-generation.ipynb) which demonstrates the use of multiple providers and routing between them through Configs.
# Speech-to-Text
Source: https://docs.portkey.ai/docs/product/ai-gateway/multimodal-capabilities/speech-to-text
Portkey's AI gateway supports STT models like Whisper by OpenAI.
## Transcription & Translation Usage
Portkey supports both `Transcription` and `Translation` methods for STT models and follows the OpenAI signature where you can send the file (in `flac`, `mp3`, `mp4`, `mpeg`, `mpga`, `m4a`, `ogg`, `wav`, or `webm` formats) as part of the API request.
Here's an example:
OpenAI NodeJSOpenAI PythonREST
## Creating prompt templates for vision models
Portkey's prompt library supports creating templates with image inputs. If the same image will be used in all prompt calls, you can save it as part of the template's image URL itself. Or, if the image will be sent via the API as a variable, add a variable to the image link.
## Supported Providers and Models
Portkey supports all vision models from its integrated providers as they become available. The table below shows some examples of supported vision models. Please raise a [request](/integrations/llms/suggest-a-new-integration) or a [PR](https://github.com/Portkey-AI/gateway/pulls) to add a provider to the AI gateway.
| Provider | Models | Functions |
| ----------------------------------------------- | -------------------------------------------------------------------------------------------------- | ---------------------- |
| [OpenAI](/integrations/llms/openai) | `gpt-4-vision-preview`, `gpt-4o`, `gpt-4o-mini ` | Create Chat Completion |
| [Azure OpenAI](/integrations/llms/azure-openai) | `gpt-4-vision-preview`, `gpt-4o`, `gpt-4o-mini ` | Create Chat Completion |
| [Gemini](/integrations/llms/gemini) | `gemini-1.0-pro-vision `, `gemini-1.5-flash`, `gemini-1.5-flash-8b`, `gemini-1.5-pro` | Create Chat Completion |
| [Anthropic](/integrations/llms/anthropic) | `claude-3-sonnet`, `claude-3-haiku`, `claude-3-opus`, `claude-3.5-sonnet`, `claude-3.5-haiku` | Create Chat Completion |
| [AWS Bedrock](/integrations/llms/aws-bedrock) | `anthropic.claude-3-5-sonnet anthropic.claude-3-5-haiku anthropic.claude-3-5-sonnet-20240620-v1:0` | Create Chat Completion |
For a complete list of all supported provider (including non-vision LLMs), check out our [providers documentation](/integrations/llms).
# Realtime API
Source: https://docs.portkey.ai/docs/product/ai-gateway/realtime-api
Use OpenAI's Realtime API with logs, cost tracking, and more!
## Next Steps
* [For more info on realtime API, refer here](https://platform.openai.com/docs/guides/realtime)
* [Portkeys OpenAI Integration](/integrations/llms/openai)
* [Logs](/product/observability/logs)
* [Traces](/product/observability/traces)
* [Guardrails](/product/ai-gateway/guardrails)
# Remote MCP
Source: https://docs.portkey.ai/docs/product/ai-gateway/remote-mcp
Portkey's AI gateway has MCP server support that many foundational model providers offer.
[Model Context Protocol](https://modelcontextprotocol.io/introduction) (MCP) is an open protocol that standardizes how applications provide tools and context to LLMs. The MCP tool in the Responses API allows developers to give the model access to tools hosted on **Remote MCP servers**. These are MCP servers maintained by developers and organizations across the internet that expose these tools to MCP clients, like the Responses API.
Portkey Supports using MCP server via the Response API. Calling a remote MCP server with the Responses API is straightforward. For example, here's how you can use the [DeepWiki](https://deepwiki.com/) MCP server to ask questions about nearly any public GitHub repository.
## OpenAI Responses API Remote MCP Support
A Responses API request to OpenAI with MCP tools enabled.
### MCP Server Authentication
Unlike the DeepWiki MCP server, most other MCP servers require authentication. The MCP tool in the Responses API gives you the ability to flexibly specify headers that should be included in any request made to a remote MCP server. These headers can be used to share API keys, oAuth access tokens, or any other authentication scheme the remote MCP server implements.
The most common header used by remote MCP servers is the `Authorization` header. This is what passing this header looks like:
Use Stripe MCP tool
To use the required deployment, simply pass the `alias` of the deployment as the `model` in LLM request body. In case the models is left empty or the specified alias does not exist, the default deployment is used.
## How are the provider API keys stored?
Your API keys are encrypted and stored in secure vaults, accessible only at the moment of a request. Decryption is performed exclusively in isolated workers and only when necessary, ensuring the highest level of data security.
## How are the provider keys linked to the virtual key?
We randomly generate virtual keys and link them separately to the securely stored keys. This means, your raw API keys can not be reverse engineered from the virtual keys.
## Using Virtual Keys
### Using the Portkey SDK
Add the virtual key directly to the initialization configuration for Portkey.
For more details, see [Bring Your Own LLM](/product/ai-gateway/byollm).
## Setting Budget Limits
Portkey provides a simple way to set budget limits for any of your virtual keys and helps you manage your spending on AI providers (and LLMs) - giving you confidence and control over your application's costs.
[Budget Limits](/product/ai-gateway/virtual-keys/budget-limits)
## Prompt Templates
Choose your Virtual Key within Portkey’s prompt templates, and it will be automatically retrieved and ready for use.
## Langchain / LlamaIndex
Set the virtual key when utilizing Portkey's custom LLM as shown below:
```py theme={"system"}
# Example in Langchain
llm = PortkeyLLM(api_key="PORTKEY_API_KEY",provider="@PROVIDER")
```
# Connect Bedrock with Amazon Assumed Role
Source: https://docs.portkey.ai/docs/product/ai-gateway/virtual-keys/bedrock-amazon-assumed-role
How to create a new integration for Bedrock using Amazon Assumed Role Authentication
## Create an AWS Role for Portkey to Assume
This role you create will be used by Portkey to execute InvokeModel commands on Bedrock models in your AWS account. The setup process will establish a minimal-permission ("least privilege") role and set it up to allow Portkey to assume this role.
### Create a permission policy in your AWS account using the following JSON
```json theme={"system"}
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "BedrockConsole",
"Effect": "Allow",
"Action": [
"bedrock:InvokeModel",
"bedrock:InvokeModelWithResponseStream"
],
"Resource": "*"
}
]
}
```
### Create a new IAM role
Choose *AWS account* as the trusted entity type. If you set an external ID be sure to copy it, we will need it later.
### Add the above policy to the role
Search for the policy you created above and add it to the role.
### Configure Trust Relationship for the role
Once the role is created, open the role and navigate to the *Trust relationships* tab and click *Edit trust policy*.
This is where you will add the Portkey AWS account as a trusted entity.
```sh Portkey Account ARN theme={"system"}
arn:aws:iam::299329113195:role/portkey-app
```
You're all set! You can now use the providers inherited from your integration to invoke Bedrock models.
# Budget Limits
Source: https://docs.portkey.ai/docs/product/ai-gateway/virtual-keys/budget-limits
Budget Limits lets you set cost limits on providers/integrations
Budget Limits lets you set cost or token limits on providers/integrations
### Reset Period Options
* **No Periodic Reset**: The budget limit applies until exhausted with no automatic renewal
* **Reset Weekly**: Budget limits automatically reset every week
* **Reset Monthly**: Budget limits automatically reset every month
### Reset Timing
* Weekly resets occur at the beginning of each week (Sunday at 12 AM UTC)
* Monthly resets occur on the **1st** calendar day of the month, at **12 AM UTC**, irrespective of when the budget limit was set prior
## Editing Budget Limits
If you need to change or update a budget limit, you can **duplicate** the existing provider and create a new one with the desired limit.
## Monitoring Your Spending and Usage
You can track your spending and token usage for any specific provider by navigating to the Analytics tab and filtering by the **desired provider** and **timeframe**.
## Pricing Support and Limitations
Budget limits currently apply to all providers and models for which Portkey has pricing support. If a specific request log shows `0 cents` in the COST column, it means that Portkey does not currently track pricing for that model, and it will not count towards the providers's budget limit.
For token-based budgets, Portkey tracks both input and output tokens across all supported models.
It's important to note that budget limits cannot be applied retrospectively. The spend counter starts from zero only after you've set a budget limit for a key.
## Availability
Budget Limits is currently available **exclusively to Portkey Enterprise** customers and select Pro users. If you're interested in enabling this feature for your account, please reach out to us at [support@portkey.ai](mailto:support@portkey.ai) or join the [Portkey Discord](https://portkey.ai/community) community.
## Enterprise Plan
To discuss Portkey Enterprise plan details and pricing, [you can schedule a quick call here](https://portkey.sh/demo-16).
# Rate Limits
Source: https://docs.portkey.ai/docs/product/ai-gateway/virtual-keys/rate-limits
Set Rate Limts to your Integrations/Providers
Rate Limits lets you set request or token consumption limits on providers/integrations
> #### Key Considerations
>
> * Rate limits can be set as either request-based or token-based
> * Time intervals can be configured as per minute, per hour, or per day
> * Setting the limit to 0 disables the provider
> * Rate limits apply immediately after being set
> * Once set, rate limits **cannot be edited** by any organization member
> * Rate limits work for **all providers** available on Portkey and apply to **all organization members** who use the provider
> * After a rate limit is reached, requests will be rejected until the time period resets
## Rate Limit Intervals
You can choose from three different time intervals for your rate limits:
* **Per Minute**: Limits reset every minute, ideal for fine-grained control
* **Per Hour**: Limits reset hourly, providing balanced usage control
* **Per Day**: Limits reset daily, suitable for broader usage patterns
## Exceeding Rate Limits
When a rate limit is reached:
* Subsequent requests are rejected with a specific error code
* Error messages clearly indicate that the rate limit has been exceeded
* The limit automatically resets after the specified time period has elapsed
## Editing Rate Limits
If you need to change or update a rate limit, you can **duplicate** the existing provider and create a new one with the desired limit.
## Monitoring Your Usage
You can track your request and token usage for any specific provider by navigating to the Analytics tab and filtering by the **desired provider** and **timeframe**.
## Use Cases for Rate Limits
* **Cost Control**: Prevent unexpected usage spikes that could lead to high costs
* **Performance Management**: Ensure your application maintains consistent performance
* **Fairness**: Distribute API access fairly across teams or users
* **Security**: Mitigate potential abuse or DoS attacks
* **Provider Compliance**: Stay within the rate limits imposed by underlying AI providers
## Availability
Rate Limits is currently available **exclusively to Portkey Enterprise** customers and select Pro users. If you're interested in enabling this feature for your account, please reach out to us at [support@portkey.ai](mailto:support@portkey.ai) or join the [Portkey Discord](https://portkey.ai/community) community.
## Enterprise Plan
To discuss Portkey Enterprise plan details and pricing, [you can schedule a quick call here](https://portkey.sh/demo-16).
# Enterprise Offering
Source: https://docs.portkey.ai/docs/product/enterprise-offering
This hierarchy allows for efficient management of resources and access control across your organization. At the top level, you have your Account, which can contain one or more Organizations. Each Organization can have multiple Workspaces, providing a way to separate teams, projects, or departments within your company.
### JWKS Configuration
To validate JWTs, you must configure one of the following:
* **JWKS URL**: A URL from which the public keys will be dynamically fetched.
* **JWKS JSON**: A static JSON containing public keys.
## Hard Requirements (Read First)
### JWT Header (JOSE Header)
* `alg`: Must be `RS256`. Symmetric algorithms like `HS256` are not accepted.
* `typ`: Must be `JWT`.
* `kid`: Required. The value in the JWT header must match a `kid` in your JWKS.
### Key Requirements
* Key type: RSA
* Key size: 2048 bits or higher
* Your JWKS must expose only the public key parameters (e.g., `kty`, `n`, `e`, `use`, `alg`, `kid`). Do not include private key material.
## JWT Requirements
### Supported Algorithm
* JWTs must be signed using **RS256** (RSA Signature with SHA-256).
### Required Claims
Your JWT payload must contain the following claims:
| **Claim Key** | **Description** |
| -------------------------------------- | -------------------------------------------------- |
| `portkey_oid` / `organisation_id` | Unique identifier for the organization. |
| `portkey_workspace` / `workspace_slug` | Identifier for the workspace. |
| `scope` / `scopes` | Permissions granted by the token. |
| `exp` | Expiration time (as a UNIX timestamp, in seconds). |
* `exp` is mandatory. Tokens without `exp` or with expired `exp` are rejected.
* `iat` and/or `nbf` are recommended but optional.
### User Identification
Portkey identifies users in the following order of precedence for logging and metrics:
1. `email_id`
2. `sub`
3. `uid`
## End-to-End Working Example (Generate → Configure JWKS → Sign → Call)
The following example uses Node.js and the `jose` library to:
1. generate an RSA key pair,
2. create a JWKS containing the public key,
3. sign a JWT with the private key,
4. call Portkey with the JWT.
### 1) Prerequisites
```sh theme={"system"}
# Node 18+ recommended
npm init -y
npm install jose
```
### 2) Generate RSA Keys, Create JWKS, and Sign a JWT (NodeJS)
Create `generate-and-sign-jwt.mjs`:
```js theme={"system"}
import { generateKeyPair, exportJWK, SignJWT } from 'jose';
import { randomUUID } from 'node:crypto';
import fs from 'node:fs';
const { publicKey, privateKey } = await generateKeyPair('RS256');
// Create a public JWK for JWKS
const publicJwk = await exportJWK(publicKey);
publicJwk.kty = 'RSA';
publicJwk.use = 'sig';
publicJwk.alg = 'RS256';
publicJwk.kid = randomUUID();
const jwks = { keys: [publicJwk] };
fs.writeFileSync('jwks.json', JSON.stringify(jwks, null, 2));
const now = Math.floor(Date.now() / 1000);
// Sign a JWT with the private key
const jwt = await new SignJWT({
portkey_oid: '
2. Complete the required fields to create a new application.
3. Once the application is created, navigate to the application's **Provisioning** page under the **Manage** section.
4. Click **`New Configuration`** to go to the provisioning settings page.
5. Obtain the **Tenant URL** and **Secret Token** from the Portkey Admin Settings page (if SCIM is enabled for your organization).
* [Portkey Settings Page](https://app.portkey.ai/settings/organisation/sso)
6. Fill in the values from the Portkey dashboard in Entra's provisioning settings and click **`Test Connection`**. If successful, click **`Create`**.
> If the test connection returns any errors, please contact us at [support@portkey.ai](mailto:support@portkey.ai).
***
##### Application Roles
Portkey supported roles should match Entra's application roles.
1. Navigate to **App Registrations** under **Enterprise Applications**, click **All Applications**, and select the application created earlier.
2. Go to the **App Roles** page and click **`Create app role`**.
> Portkey supports two application-level roles:
>
> * **`member`** (Organization Member)
> * **`admin`** (Organization Admin)
> * **`owner`** (Organization Owner)
> Users assigned any other role will default to the **member** role.
3. To support group roles, create a role with the value **`group`** and a name in title-case (e.g., `Group` for the value `group`).
4. Assign users to the application with the desired role (e.g., **`owner`**, **`member`**, or **`admin`**) for the organization.
***
#### Attribute Mapping
###### Adding a New Attribute
1. Go to the **Provisioning** page and click **Attribute Mapping (Preview)** to access the attributes page.
2. Enable advanced options and click **`Edit attribute list for customappsso`**.
3. Add a new attribute called **`roles`** with the following properties:
* **Multi-valued:** Enabled
* **Type:** String
###### Adding a new mapping
1. Click on the **`Add new mapping`** link to add a new mapping. (refer to the above images).
2. Follow the values from the below image to add a new mapping.
3. Once done, save the changes.
###### Removing Unnecessary Attributes
Delete the following unsupported attributes:
* **preferredLanguage**
* **addresses (all fields)**
* **phoneNumbers**
***
#### Updating Attributes
**Update `displayName`**
1. Edit the **`displayName`** field to concatenate `firstName + lastName` instead of using the default `displayName` value from Entra records.
2. Save the changes and enable provisioning on the **Overview** page of the provisioning settings.
***
##### Group (Workspace) Provisioning
Portkey supports RBAC (Role-Based Access Control) for workspaces mapped to groups in Entra. Use the following naming convention for groups:
* **Format:** `ws-{group}-role-{role}`
* **Role:** One of `admin`, `member`, or `manager`
* A user should belong to only one group per `{group}`.
**Example:**
For a `Sales` workspace:
* `ws-Sales-role-admin`
* `ws-Sales-role-manager`
* `ws-Sales-role-member`
Users assigned to these groups will inherit the corresponding role in Portkey.
***
### Support
If you face any issues with the group provisioning, please reach out to us at [here](mailto:support@portkey.ai).
# Okta
Source: https://docs.portkey.ai/docs/product/enterprise-offering/org-management/scim/okta
Set up Okta for SCIM provisioning with Portkey.
Portkey supports provisioning Users & Groups with Okta SAML Apps.
3. Fill in the values from the Portkey dashboard into Okta's provisioning settings and click **`Test Connection`**. If successful, click **`Save`**.
5. Once the details are saved, you will see two more options along with integration, namely `To App` and `To Okta`.
Select `To App` to configure provisioning from Okta to Portkey.
Enable the following checkboxes:
* Create Users
* Update User Attributes
* Deactivate Users
After saving the settings, the application header should resemble the following image.
This completes the SCIM provisioning settings between Okta and Portkey.
Whenever you assign a `User` or `Group` to the application, Okta automatically pushes the updates to Portkey.
***
### Organisation role support
Portkey supports the following organisation roles:
* **`owner`** (Organization Owner)
* **`admin`** (Organization Admin)
* **`member`** (Organization Member)
Users assigned any other role will default to the **member** role.
#### Editing Attributes
Okta by default doesn't support role attributes. To support role attributes, you need to edit the attributes in Okta.
1. Navigate to the app settings. Under general settings, click on the `Provisioning` tab.
2. Click on the `Go to Profile Editor` button, found under **Attribute Mappings** section.
3. Click on the `Add Attribute` button.
4. Fill the form with the following details:
5. Click on the `Save` button.
#### Verifying the changes
To verify the changes, you can assign a user to the application with the desired role (e.g., **`owner`**, **`member`**, or **`admin`**) for the organization.
2. Select **Find group by name**.
3. Enter the name of the group, select the group from the list, and click **Save** or **Save & Add Another** to assign a new group.
Resources that must be deleted before removing a workspace like - Prompts, Prompt partials, Providers, Configs, Guardrails and more.
Once all resources have been removed, enter the workspace name in the confirmation field to proceed with deletion.
Let's see in detail below:
Each Guardrail Check has a custom input field based on its usecase — just add the relevant details to the form and save your check.
### There are 6 Types of Guardrail Actions
| Action | State | Description | Impact |
| :------------- | :-------------------------------------- | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Async** | **TRUE** This is the **default** state | Run the Guardrail checks **asynchronously** along with the LLM request. | Will add no latency to your requestUseful when you only want to log guardrail checks without affecting the request |
| **Async** | **FALSE** | **On Request** Run the Guardrail check **BEFORE** sending the request to the **LLM** **On Response** Run the Guardrail check **BEFORE** sending the response to the **user** | Will add latency to the requestUseful when your Guardrail critical and you want more orchestration over your request based on the Guardrail result |
| **Deny** | **TRUE** | **On Request & Response** If any of the Guardrail checks **FAIL**, the request will be killed with a **446** status code. If all of the Guardrail checks **SUCCEED**, the request/response will be sent further with a **200** status code. | This is useful when your Guardrails are critical and upon them failing, you can not run the requestWe would advice running this action on a subset of your requests to first see the impact |
| **Deny** | **FALSE** This is the **default** state | **On Request & Response** If any of the Guardrail checks **FAIL**, the request will STILL be sent, but with a **246** status code. If all of the Guardrail checks **SUCCEED**, the request/response will be sent further with a **200** status code. | This is useful when you want to log the Guardrail result but do not want it to affect your result |
| **On Success** | **Send Feedback** | If **all of the** Guardrail checks **PASS**, append your custom defined feedback to the request | We recommend setting up this actionThis will help you build an "Evals dataset" of Guardrail results on your requests over time |
| **On Failure** | **Send Feedback** | If **any of the** Guardrail checks **FAIL**, append your custom feedback to the request | We recommend setting up this actionThis will help you build an "Evals dataset" of Guardrail results on your requests over time |
Set the relevant actions you want with your checks, name your Guardrail and save it! When you save the Guardrail, you will get an associated `$Guardrail_ID` that you can then add to your request.
***
## 3. "Enable" the Guardrails through Configs
This is where Portkey's magic comes into play. The Guardrail you created above is yet not an `Active` guardrail because it is not attached to any request.
Configs is one of Portkey's most powerful features and is used to define all kinds of request orchestration - everything from caching, retries, fallbacks, timeouts, to load balancing.
***
## Understanding Guardrail Response Structure
When Guardrails are enabled and configured to run synchronously (`async=false`), Portkey adds a `hook_results` object to your API responses. This object provides detailed information about the guardrail checks that were performed and their outcomes.
### Hook Results Structure
The `hook_results` object contains two main sections:
```json theme={"system"}
"hook_results": {
"before_request_hooks": [...], // Guardrails applied to the input
"after_request_hooks": [...] // Guardrails applied to the output
}
```
Each section contains an array of guardrail execution results, structured as follows:
## Get Support
If you're implementing guardrails for embeddings and need assistance, reach out to the Portkey team on the [community forum](https://discord.gg/portkey-llms-in-prod-1143393887742861333).
## Learn More
* [Portkey Guardrails Overview](/product/guardrails)
* [List of Guardrail Checks](/product/guardrails/list-of-guardrail-checks)
* [Creating Raw Guardrails in JSON](/product/guardrails/creating-raw-guardrails-in-json)
# List of Guardrail Checks
Source: https://docs.portkey.ai/docs/product/guardrails/list-of-guardrail-checks
Each Guardrail Check has a specific purpose, it's own parameters, supported hooks, and sources.
## Partner Guardrails
## Guardrails Support
PII redaction is supported across 5 guardrail providers:
1. **Navigate to Guardrails**: Go to the `Guardrails` page and click `Create`
2. **Select Regex Match**: Choose the "Regex Match" guardrail from the BASIC category
3. **Configure the Pattern**:
* **Regex Rule**: Enter your regex pattern to match specific PII (e.g., `\b\d{3}-\d{2}-\d{4}\b` for SSN patterns)
* **Replacement Text**: Define what to replace matches with (e.g., `[REDACTED]`, `*****`, `[SSN_HIDDEN]`)
* **Enable Redact**: Toggle the "Redact" option to `ON`
* **Inverse**: Keep this `OFF` unless you want to redact everything except the pattern
4. **Save the Guardrail**: Name your guardrail and save it to get the associated Guardrail ID
### Common Regex Patterns for PII
```
| PII Type | Regex Pattern | Replacement Example |
|----------|---------------|-------------------|
| Credit Card | `\b\d{4}[- ]?\d{4}[- ]?\d{4}[- ]?\d{4}\b` | `[CREDIT_CARD]` |
| Social Security Number | `\b\d{3}-\d{2}-\d{4}\b` | `[SSN_REDACTED]` |
| Phone Numbers | `\b\d{3}[-.]\d{3}[-.]\d{4}\b` | `[PHONE_HIDDEN]` |
| Email Addresses | `\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b` | `[EMAIL_REDACTED]` |
| Custom Employee IDs | `EMP-\d{6}` | `[EMPLOYEE_ID]` |
```
### Adding to Your Config
Once you've created your custom PII regex guardrail, add it to your Portkey config:
```json theme={"system"}
{
"before_request_hooks": [
{"id": "your-custom-pii-guardrail-id"}
],
"after_request_hooks": [
{"id": "your-custom-pii-guardrail-id"}
]
}
```
## Security Architecture
The gateway implements defense-in-depth security:
1. **Client Authentication**: OAuth 2.1 tokens validated on every request
2. **Authorization**: Scope-based access control for MCP operations
3. **Token Isolation**: Client tokens never forwarded to upstream servers
4. **Session Security**: Cryptographically secure session IDs with token-aligned expiration
5. **Transport Security**: TLS encryption for all connections
6. **Audit Logging**: Complete request/response audit trail
# Deployment
Source: https://docs.portkey.ai/docs/product/mcp-gateway/deployment
## via Portkey Cloud
Portkey hosts the MCP Gateway on our edge network available through `https://mcp.portkey.ai`
The control plane is available on `https://app.portkey.ai` that gives you a UI-friendly way to manage, observe and govern MCP servers and their usage.
## Self-hosted Deployments
For enterprises self-hosting the gateway, the MCP gateway is packaged in the same AI gateway docker deployment.
MCP and AI gateways are exposed through different ports making it easy to map them to different base URLs which is advisable.
We recommend URLs like `https://ai.yourcompanyname.com` for the LLM gateway and `https://mcp.yourcompanyname.com` for the MCP gateway.
The helm chart contains variables to run both or individual servers in your deployment.
# MCP Clients
Source: https://docs.portkey.ai/docs/product/mcp-gateway/mcp-clients
Portkey makes it simple to integrate with MCP clients by connecting directly to your MCP servers and handling authentication automatically.
With Portkey, you can connect any server registered on Portkey’s MCP Hub using just a URL—no infrastructure setup, no secret management.
***
## Connecting your MCP Server
Once you’ve connected your MCP server to Portkey, Portkey provides you with a ready-to-use URL. For example:
```
https://mcp.portkey.ai/:workspace-id/:mcp-server-id/mcp
```
That’s all you need to connect Portkey to any MCP client.
***
You can simply add your MCO server configuration using JSON in your desired application:
Add this configuration to your Claude or other MCP client settings.
For example, `@openai-prod/gpt-4o`, `@anthropic/claude-3-sonnet`, `@bedrock-us/claude-3-sonnet-v1`
Once you create an Integration (by storing your credentials), you can use it to create multiple AI Providers. For example, you might have one OpenAI Integration, but create three different AI Providers from it:
* `@openai-dev` for development with strict rate limits
* `@openai-staging` for testing with moderate budgets
* `@openai-prod` for production with higher limits
This separation gives you granular control over how different teams and environments use the same underlying credentials.
These limits cascade down to all AI Providers created from that Integration.
This hierarchical approach ensures teams only have access to the resources they need.
Model provisioning helps you maintain consistency and control costs across your organization.
## Create an AWS Role for Portkey to Assume
This role you create will be used by Porktey to execute InvokeModel commands on Bedrock models in your AWS account. The setup process will establish a minimal-permission ("least privilege") role and set it up to allow Porktey to assume this role.
### Create a permission policy in your AWS account using the following JSON
```json theme={"system"}
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "BedrockConsole",
"Effect": "Allow",
"Action": [
"bedrock:InvokeModel",
"bedrock:InvokeModelWithResponseStream"
],
"Resource": "*"
}
]
}
```
### Create a new IAM role
Choose *AWS account* as the trusted entity type. If you set an external ID be sure to copy it, we will need it later.
### Add the above policy to the role
Search for the policy you created above and add it to the role.
### Configure Trust Relationship for the role
Once the role is created, open the role and navigate to the *Trust relationships* tab and click *Edit trust policy*.
This is where you will add the Portkey AWS account as a trusted entity.
```sh Portkey Account ARN theme={"system"}
arn:aws:iam::299329113195:role/portkey-app
```
You're all set! You can now use the new provider to invoke Bedrock models.
# Adding Custom Models
Source: https://docs.portkey.ai/docs/product/model-catalog/custom-models
Learn how to add custom and fine-tuned models to your Portkey Model Catalog for seamless integration.
Portkey's Model Catalog is designed to be a central hub for all the models you use, providing a unified interface for both supported and private models. By adding your custom or fine-tuned models to the catalog, you can seamlessly integrate them into your applications, leverage Portkey's unified API signature, and benefit from centralized logging, monitoring, and management.
This is especially useful for:
* **Fine-tuned Models**: Integrating your fine-tuned versions of popular models (e.g., `ft:gpt-4.1-nano`).
* **Internal/Proprietary Models**: Using your own in-house models through the Portkey gateway.
* **Models Not Natively Supported**: Adding models from providers that Portkey doesn't yet have a direct integration with, while mapping them to a compatible API signature.
## Adding a New Custom Model
You can add a custom model directly from your Portkey dashboard.
1. Navigate to the relevant [**Integration**](https://app.portkey.ai/integrations) in your Portkey account, select the Integration for which you want to add a custom model.
2. Select the **Model Provisioning** step from steps.
3. Click the **Add Model** button on the top-right corner of the page.
This will open the **Add Custom Model** form.
### Configuration Fields
Here’s a breakdown of each field in the form:
* API Key
* Optional: Organization ID, Project ID
**For AWS Bedrock:**
* AWS Access Key
* AWS Secret Access Key
* AWS Access Key ID
* AWS Region
1. In your Integration settings, navigate to **Workspace Provisioning**
2. Select which workspaces should have access:
* **All Workspaces**: Grants access to every workspace in your organization
* **Specific Workspaces**: Choose individual workspaces that need access
3. For each workspace, click the `Edit Budget & Rate Limits`
1. In your Integration settings, navigate to **Model Provisioning**
2. Select the configuration options:
* **Allow All Models**: Provides access to all models offered by the provider
* **Allow Specific Models**: Create an allowlist of approved models
#### Advanced Model Management
### Custom Models
The Model Catalog isn't limited to standard provider models. You can add:
* **Fine-tuned models**: Your custom OpenAI or Anthropic fine-tunes
* **Self-hosted models**: Models running on your infrastructure
* **Private models**: Internal models not publicly available
Each custom model gets the same governance controls as standard models.
### Budget Limits
**Budget Limits on Integrations** provide a simple way to manage your spending on AI providers (and LLMs) - giving you confidence and control over your application's costs. They act as financial guardrails, preventing unexpected AI costs across your organization. These limits cascade down to all AI Providers created from this Integration.
#### Setting Budget Limits
1. In your Integration settings, navigate to **Workspace Provisioning**
2. Select which workspaces should have access:
* **All Workspaces**: Grants access to every workspace in your organization
* **Specific Workspaces**: Choose individual workspaces that need access
3. Click on the `Edit Budget & Rate Limits`
**Reset Period Options:**
* **No Periodic Reset**: The budget limit applies until exhausted with no automatic renewal
* **Reset Weekly**: Budget limits automatically reset every week
* **Reset Monthly**: Budget limits automatically reset every month
**Reset Timing:**
* Weekly resets occur at the beginning of each week (Sunday at 12 AM UTC)
* Monthly resets occur on the **1st** calendar day of the month, at **12 AM UTC**, irrespective of when the budget limit was set prior
### Rate Limits
Rate limits control the velocity of API usage, protecting against runaway processes and ensuring fair resource distribution across teams.
#### Setting Rate Limits
1. In your Integration settings, navigate to **Workspace Provisioning**
2. Select which workspaces should have access:
* **All Workspaces**: Grants access to every workspace in your organization
* **Specific Workspaces**: Choose individual workspaces that need access
3. Click on the `Edit Budget & Rate Limits`
# Overriding Model Details
Source: https://docs.portkey.ai/docs/product/model-catalog/model-overrides
Learn how to set custom pricing for any base model in your Portkey Model Catalog to reflect your specific costs.
Portkey's Model Catalog automatically includes the standard pay-as-you-go pricing for supported models, which is used to calculate costs on your dashboard. However, you may have different pricing arrangements with your model providers, such as negotiated discounts, committed use contracts, or your own internal pricing for private models.
To ensure your cost and usage analytics are accurate, Portkey allows you to override the default pricing for any model in the catalog.
## How to Override Model Pricing
You can change the pricing for any model directly from your Portkey dashboard.
1. Navigate to the relevant [**Integration**](https://app.portkey.ai/integrations) in your Portkey account, select the Integration for which you want to add a custom model.
2. Select the **Model Provisioning** step from steps.
3. Find the model whose pricing you want to change in the list.
4. Click the **Edit** (pencil) icon on the right side of the model's row.
This will open a form where you can set your custom pricing.
### Pricing Fields
When you edit a model, you can set the following pricing values:
## Features
Portkey records all your multimodal requests and responses, making it easy to view, monitor, and debug interactions.
Portkey supports request tracing to help you monitor your applications throughout the lifecycle of a request.
A comprehensive view of 21+ key metrics. Use it to analyze data, spot trends, and make informed decisions.
Streamline your data view with customizable filters. Zero in on data that matters most.
Enrich your LLM APIs with custom metadata. Assign unique tags for swift grouping and troubleshooting.
Add feedback values and weights to complete the loop.
Set up budget limits for your provider API keys and gain confidence over your application's costs.
## Charts
The dashboard provides insights into your [users](/product/observability/analytics#users), [errors](/product/observability/analytics#errors), [cache](/product/observability/analytics#cache), [feedback](/product/observability/analytics#feedback) and also summarizes information by [metadata](/product/observability/analytics#metadata-summary).
### Overview
The overview tab is a 70,000ft view of your application's performance. This highlights the cost, tokens used, mean latency, requests and information on your users and top models.
This is a good starting point to then dive deeper.
### Users
The users tab provides an overview of the user information associated with your Portkey requests. This data is derived from the `user` parameter in OpenAI SDK requests or the special `_user` key in the Portkey [metadata header](/product/observability/metadata).
Portkey currently does not provide analytics on usage patterns for individual team members in your Portkey organization. The users tab is designed to track end-user behavior in your application, not internal team usage.
### Errors
Portkey captures errors automatically for API and Accuracy errors. The charts give you a quick sense of error rates allowing you to debug further when needed.
The dashboard also shows you the number of requests rescued by Portkey through the various AI gateway strategies.
### Cache
When you enable cache through the AI gateway, you can view data on the latency improvements and cost savings due to cache.
### Feedback
Portkey allows you to collect feedback on LLM requests through the logs dashboard or via API. You can view analytics on this feedback collected on this dashboard.
### Metadata Summary
Group your request data by metadata parameters to unlock insights on usage. Select the metadata property to use in the dropdown and view the request data grouped by values of that metadata parameter.
This lets you answer questions like:
1. Which users are we spending the most on?
2. Which organisations have the highest latency?
# Auto-Instrumentation [BETA]
Source: https://docs.portkey.ai/docs/product/observability/auto-instrumentation
Portkey's auto-instrumentation allows you to instrument tracing and logging for multiple LLM/Agent frameworks and view the logs, traces, and metrics in a single place.
## Supported Frameworks
We currently support auto-instrumentation for the following frameworks:
* [CrewAI](/integrations/agents/crewai#auto-instrumentation)
* [LangGraph](/integrations/agents/langgraph#auto-instrumentation)
## Saved Filters
Quickly access your frequently used filter combinations with the `Saved Filters` feature. Save any set of filters directly from the search bar on the Logs or Analytics pages. Saved filters allow you to instantly apply complex filter rules without retyping them every time.
Saved filters are accessible to all organization members, who can also edit, rename, or delete them as needed. Share saved filters with teammates to streamline collaboration and ensure everyone has access to the right data views.
# Logs
Source: https://docs.portkey.ai/docs/product/observability/logs
The Logs section presents a chronological list of all the requests processed through Portkey.
## Share Logs with Teammates
Each log on Portkey has a unique URL. You can copy the link from the address bar and directly share it with anyone in your org.
## Request Status Guide
The Status column on the Logs page gives you a snapshot of the gateway activity for every request.
Portkey’s gateway features—[Cache](/product/ai-gateway/cache-simple-and-semantic), [Retries](/product/ai-gateway/automatic-retries), [Fallback](/product/ai-gateway/fallbacks), [Loadbalance](/product/ai-gateway/load-balancing) are tracked here with their exact states (`disabled`, `triggered`, etc.), making it a breeze to monitor and optimize your usage.
**Common Queries Answered:**
* **Is the cache working?**: Enabled caching but unsure if it's active? The Status column will confirm it for you.
* **How many retries happened?**: Curious about the retry count for a successful request? See it in a glance.
* **Fallback and Loadbalance**: Want to know if load balance is active or which fallback option was triggered? See it in a glance.
| Option | 🔴 Inactive State | 🟢 Possible Active States |
| --------------- | --------------------- | ------------------------------------------------------- |
| **Cache** | Cache Disabled | Cache Miss,Cache Refreshed,Cache Hit,Cache Semantic Hit |
| **Retry** | Retry Not Triggered | Retry Success on {x} Tries,Retry Failed |
| **Fallback** | Fallback Disabled | Fallback Active |
| **Loadbalance** | Loadbalancer Disabled | Loadbalancer Active |
## Manual Feedback
As you're viewing logs, you can also add manual feedback on the logs to be analysed and filtered later. This data can be viewed on the [feedback analytics dashboards](/product/observability/analytics#feedback).
## Configs & Prompt IDs in Logs
If your request has an attached [Config](/product/ai-gateway/configs) or if it's originating from a [prompt template](/product/prompt-library), you can see the relevant Config or Prompt IDs separately in the log's details on Portkey. And to dig deeper, you can just click on the IDs and Portkey will take you to the respective Config or Prompt playground where you can view the full details.
## Debug Requests with Log Replay
You can rerun any buggy request with just one click, straight from the log details page. The `Replay` button opens your request in a fresh prompt playground where you can rerun the request and edit it right there until it works.
# Logs Export
Source: https://docs.portkey.ai/docs/product/observability/logs-export
Easily access your Portkey logs data for further analysis and reporting
4. After setting your parameters, click **Request Export**.
## Available Export Fields
When configuring your log export, you can select from the following fields:
| Field Name | Description |
| --------------- | ------------------------------------------- |
| ID | Unique identifier for the log entry |
| Trace ID | Identifier for tracing related requests |
| Created At | Timestamp of the request |
| Request | Request JSON payload |
| Response | Response JSON payload |
| AI Provider | Name of the AI provider used |
| AI Model | Name of the AI model used |
| Request Tokens | Number of tokens in the request |
| Response Tokens | Number of tokens in the response |
| Total Tokens | Total number of tokens (request + response) |
| Cost | Cost of the request in cents (USD) |
| Cost Currency | Currency of the cost (USD) |
| Response Time | Response time in milliseconds |
| Status Code | HTTP response status code |
| Config | Config ID used for the request |
| Prompt Slug | Prompt ID used for the request |
| Metadata | Custom metadata key-value pairs |
5. Once your export is processed, you'll see it in the exports list with a status indicator:
* **Draft**: Export job created but not yet started
* **Success**: Export completed successfully
* **Failure**: Export job failed. Click on the `Start Again` button to retry the job.
6. Clik on the **Start** button the dashboard to start the logs-export job
7. For completed exports, click the **Download** button to get your logs data file. You can
### Request Logs
You can also apply any metadata filters to the logs or analytics and filter data by any metadata key you've used:
## Enterprise Features
For enterprise users, Portkey offers advanced metadata governance and lets you define metadata at multiple levels:
1. **Request level** - Applied to a single request
2. **API key level** - Applied to all requests using that key
3. **Workspace level** - Applied to all requests in a workspace
## Why Use OpenTelemetry with Portkey?
Portkey's OTel backend is compatible with any OTel-compliant library. Here are a few popular ones for GenAI and general application observability:
## How Tracing Works
Portkey implements OpenTelemetry-compliant tracing. When you include a `trace ID` with your requests, all related LLM calls are grouped together in the Traces View, appearing as "spans" within that trace.
> "Span" is another word for subgrouping of LLM calls. Based on how you instrument, it can refer to another group within your trace or to a single LLM call.
## Trace Tree Structure
Portkey uses a tree data structure for tracing, **similar to OTel.**
Each node in the tree is a span with a unique `spanId` and optional `spanName`. Child spans link to a parent via the `parentSpanId`. Parentless spans become root nodes.
```
traceId
├─ parentSpanId
│ ├─ spanId
│ ├─ spanName
```
| Key - Node | Key - Python | Expected Value | Required? |
| ------------ | ---------------- | -------------- | --------- |
| traceId | trace\_id | Unique string | YES |
| spanId | span\_id | Unique string | NO |
| spanName | span\_name | string | NO |
| parentSpanId | parent\_span\_id | Unique string | NO |
***
## Enabling Tracing
You can enable tracing by passing the `trace tree` values while making your request (or while instantiating your client).
Based on these values, Portkey will instrument your requests, and will show the exact trace with its spans on the "Traces" view in Logs page.
## Library Organization
Prompts can be organized into folders for better categorization. For example, you might create separate folders for:
* Customer service prompts
* Content generation prompts
* Data analysis prompts
* Agent-specific prompts
## Creating New Prompts
To add a new prompt to your library:
1. Click the "Create" button in the top-right corner
2. Select "Prompt" from the dropdown menu
3. Build your prompt in the [Prompt Playground](/product/prompt-engineering-studio/prompt-playground)
4. Save the prompt to add it to your library
New prompts are automatically assigned a unique ID that you can use to reference them in your applications via the [Prompt API](/product/prompt-engineering-studio/prompt-api).
### Organizing with Folders
To create a new folder:
1. Click "Create" in the top-right corner
2. Select "Folder" from the dropdown
3. Name your folder based on its purpose or content type
To move prompts into folders:
1. Select the prompts you want to organize
2. Use the move option to place them in the appropriate folder
## Collaboration Features
The Prompt Library is designed for team collaboration:
* All team members with appropriate permissions can access shared prompts
* Changes are tracked by user and timestamp through [Prompt Versioning](/product/prompt-engineering-studio/prompt-versioning)
* Multiple team members can work on different prompts simultaneously
This collaborative approach ensures that your team maintains consistent prompt strategies while allowing everyone to contribute their expertise.
For more details on implementing prompts in your code, see the [Prompt API](/product/prompt-engineering-studio/prompt-api) documentation.
## Next Steps
Now that you understand the basics of the Prompt Library, explore these related features:
* [Prompt Playground](/product/prompt-engineering-studio/prompt-playground) - Create and test new prompts
* [Prompt Partials](/product/prompt-engineering-studio/prompt-partial) - Create reusable prompt components
* [Prompt Versioning](/product/prompt-engineering-studio/prompt-versioning) - Track changes to your prompts
* [Prompt API](/product/prompt-engineering-studio/prompt-api) - Integrate prompts into your applications
* [Prompt Observability](/product/prompt-engineering-studio/prompt-observability) - Monitor prompt performance
# Prompt Observability
Source: https://docs.portkey.ai/docs/product/prompt-engineering-studio/prompt-observability
Portkey's Prompt Observability provides comprehensive insights into how your prompts are performing in production. This feature allows you to track usage, monitor performance metrics, and analyze trends to continuously improve your prompts based on real-world usage.
The dashboard enables you to understand trends in your prompt usage over time and identify potential opportunities for optimization. For more details on using the analytics dashboard and available filters, refer to Portkey's [Analytics documentation](/product/observability/analytics).
## Prompt Logs
The Logs section on Portkey's dashboard provides detailed information about each individual prompt call, giving you visibility into exactly how your prompts are being used in real-time. You can easily filter your prompts using `prompt-id` in the logs view.
Each log entry shows the timestamp, model used, request path, user, tokens consumed, cost, and status. This granular data helps you understand exactly how your prompts are performing in production and identify any issues that need attention.
For information on filtering and searching logs, refer to Portkey's [Logs documentation](/product/observability/logs).
This chronological view makes it easy to see how your template is being used and how it's performing over time. You can quickly access detailed information about each call directly from this history view.
The template history is particularly useful when you're iterating on a prompt design, as it allows you to see the immediate impact of your changes. Combined with [Prompt Versioning](/product/prompt-engineering-studio/prompt-versioning), this gives you a complete view of your prompt's evolution and performance.
## Next Steps
Now that you understand how to monitor your prompts, explore these related features:
* [Prompt Versioning](/product/prompt-engineering-studio/prompt-versioning) - Track changes to your prompts over time
* [Prompt API](/product/prompt-engineering-studio/prompt-api) - Integrate optimized prompts into your applications
* [Prompt Playground](/product/prompt-engineering-studio/prompt-playground) - Test and refine your prompts based on observability insights
* [Prompt Partials](/product/prompt-engineering-studio/prompt-partial) - Create reusable components for your prompts
* [Tool Library](/product/prompt-engineering-studio/tool-library) - Enhance your prompts with specialized tools
# Prompt Partials
Source: https://docs.portkey.ai/docs/product/prompt-engineering-studio/prompt-partial
With Prompt Partials, you can save your commonly used templates (which could be your instruction set, data structure explanation, examples etc.) separately from your prompts and flexibly incorporate them wherever required.
You can create a new Partial and use it for any purpose in any of your prompt templates. For example, here's a prompt partial where we are separately storing the instructions:
Upon saving, each Partial generates a unique ID that you can use inside [prompt templates](/product/prompt-engineering-studio/prompt-playground#prompt-templates).
### Template Engine
Partials also follow the [Mustache template engine](https://mustache.github.io/) and let you easily handle data input at runtime by using tags.
Portkey supports `{{variable}}`, `{{#block}}
When a partial is incorporated in a template, all the variables/blocks defined are also rendered on the Prompt variables section:
When a new Partial version is **Published**, your partial that is in use in any of the prompt templates also gets automatically updated.
### Using Different Versions of Partials
Similar to prompt templates, you can reference specific versions of your prompt partials in the playground. By default, when you use a partial, Portkey uses the published version, but you can specify any version you want.
To reference a specific version of a partial, use the following syntax:
```
{{>prompt-partial-id@version-number}}
```
For example:
```
{{>pp-instructions-123@5}}
```
This will use version 5 of the prompt partial with ID "pp-instructions-123".
**Note:** Unlike prompt templates, prompt partials do not support `labels`, `@latest`, `@published` for versioning. You can only reference partials by their version number, `@latest`, or the published version.
### Making a Prompt Completion Request
All the variables/tags defined inside the partial can now be directly called at the time of making a `prompts.completions` request:
## Getting Started
When you first open the Playground, you'll see a clean interface with a few key components:
* A model selector where you can choose from 1600+ models across 20+ providers
* A messaging area where you'll craft your prompt
* A completion area where you'll see model responses
The beauty of the Playground is its simplicity - write a prompt, click "Generate Completion", and instantly see how the model responds.
### Crafting Your First Prompt
Creating a prompt is straightforward:
1. Select your model of choice - from OpenAI's GPT-4o to Anthropic's Claude or any model from your configured providers
2. Enter a system message (like "You're a helpful assistant")
3. Add your user message or query
4. Click "Generate Completion" to see the response
You can continue the conversation by adding more messages, helping you simulate real-world interactions with your AI.
### Using Prompt Templates in Your Application
Once you save a prompt in the Playground, you'll receive a `prompt ID` that you can use directly in your application code. This makes it easy to move from experimentation to production:
### Enhancing Prompts with Tools
Some models support function calling, allowing the AI to request specific information or take actions. The Playground makes it easy to experiment with these capabilities.
Click "Add Tool" button to define functions the model can call. For example:
```json theme={"system"}
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City and state, e.g., San Francisco, CA"
}
},
"required": ["location"]
}
}
}
```
You can add multiple tools from the [tool library](/product/prompt-engineering-studio/tool-library) for the specific prompt template. You can also choose the parameter "tool\_choice" from the UI to control how the model uses the available tools.
This tool definition teaches the model how to request weather information for a specific location.
### Configuring Model Parameters
Each model offers various parameters that affect its output. Access these by clicking the "Parameters" button:
* **Temperature**: Controls randomness (lower = more deterministic)
* **Top P**: Alternative to temperature for controlling diversity
* **Max Tokens**: Limits response length
* **Response Format**: An important setting that allows users to define how they want the model to output. Currently there are 3 options:
* Text (default free-form text)
* JSON object (structured JSON response)
* JSON schema (requires providing a schema in the menu to make the model conform to your exact structure)
* **Thinking Mode**: Reasoning models think before they answer, producing a long internal chain of thought before responding to the user. You can access to the model's reasoning/thinking process sent by the provider. This feature:
* Is only available for select reasoning-capable models (like Claude 3.7 Sonnet)
* Can be activated by checking the "Thinking" checkbox in the Parameters panel
* Allows you to set a budget of tokens dedicated specifically to the thinking process (if the provider supports it)
And more... Experiment with these settings to find the perfect balance for your use case.
### Pretty Mode vs JSON Mode
The Playground offers two interface modes for working with prompts:
**Pretty Mode**
The default user-friendly interface with formatted messages and simple controls. This is ideal for most prompt engineering tasks and provides an intuitive way to craft and test prompts.
**JSON Mode**
For advanced users who need granular control, you can toggle to JSON mode by clicking the "JSON" button. This reveals the raw JSON structure of your prompt, allowing for precise editing and advanced configurations.
JSON mode is particularly useful when:
* Working with multimodal inputs like images
* Creating complex conditional logic
* Defining precise message structures
* Debugging API integration issues
You can switch between modes at any time using the toggle in the interface.
### Multimodality: Working with Images
For multimodal models that support images, you can upload images directly in the Playground using the 🧷 icon on the message input box.
Alternatively, you can use JSON mode to incorporate images using variables. Toggle from PRETTY to JSON mode using the button on the dashboard, then structure your prompt like this:
```json theme={"system"}
[
{
"content": [
{
"type": "text",
"text": "You're a helpful assistant."
}
],
"role": "system"
},
{
"role": "user",
"content": [
{ "type": "text", "text": "what's in this image?" },
{
"type": "image_url",
"image_url": {
"url" : "{{your_image_url}}"
}
}
]
}
]
```
Now you can pass the image URL as a variable in your prompt template, and the model will be able to analyze the image content.
# Prompt Templates
**Portkey uses** [**Mustache**](https://mustache.github.io/mustache.5.html) **under the hood to power the prompt templates.**
Mustache is a commonly used logic-less templating engine that follows a simple schema for defining variables and more.
With Mustache, prompt templates become even more extensible by letting you incorporate various `{{tags}}` in your prompt template and easily pass your data.
The most common usage of mustache templates is for `{{variables}}`, used to pass a value at runtime.
### Using Variables in Prompt Templates
Let's look at the following template:
As you can see, `{{customer_data}}` and `{{chat_query}}` are defined as variables in the template and you can pass their value at runtime:
**And the prompt template uses the partial like this:**
**We can pass the data object inside the variables:**
### Publishing Prompts
Publishing a prompt version marks it as the default version that will be used when no specific version is requested. This is especially important for production environments.
Updating the Prompt does not automatically update your prompt in production. While updating, you can tick `Publish prompt changes` which will also update your prompt deployment to the latest version.
1. Create and test your new prompt version
2. When ready for production, click "Update" and check "Publish prompt changes"
3. Portkey will save the new version and mark it as the published version
4. All default API calls will now use this version
### Viewing Version History
**All** of your prompt versions can be seen by clicking the `Version History` button on the playground:
You can `Restore` or `Publish` any of the previous versions by clicking on the ellipsis menu.
### Comparing Versions
To compare different versions of your prompt:
1. Select the versions you want to compare from the version history panel
2. Click "Compare on playground" to see a side-by-side of different prompt versions
This helps you understand how prompts have evolved and which changes might have impacted performance.
## Using Different Prompt Versions
By default, when you pass the `PROMPT_ID` in `prompts.completions.create` method, Portkey sends the request to the `Published` version of your prompt.
You can also call any specific prompt version by appending version identifiers to your `PROMPT_ID`.
### Version Number References
**For example:**
```js theme={"system"}
response = portkey.prompts.completions.create(
prompt_id="pp-classification-prompt@12",
variables={ }
)
```
Here, the request is sent to **Version 12** of the prompt template.
### Special Version References
Portkey supports special version references:
```js theme={"system"}
// Latest version (may not be published)
response = portkey.prompts.completions.create(
prompt_id="pp-classification-prompt@latest",
variables={ }
)
// Published version (default when no suffix is provided)
response = portkey.prompts.completions.create(
prompt_id="pp-classification-prompt",
variables={ }
)
```
**Important Notes:**
* `@latest` refers to the most recent version, regardless of publication status
* When no suffix is provided, Portkey defaults to the `Published` version
* Each version is immutable once created - to make changes, you must create a new version
## Prompt Labels
Labels provide a more flexible and meaningful way to reference prompt versions compared to version numbers. You can add version tags/labels like `platform-team`, `gpt-model-prompt` to any prompt version to track changes and call them directly:
### Using Labels in Your Code
In the individual log for any request, you can also see the exact status of your request and verify if it was cached, or delivered from cache with two `usage` parameters:
* `cache_creation_input_tokens`: Number of tokens written to the cache when creating a new entry.
* `cache_read_input_tokens`: Number of tokens retrieved from the cache for this request.
# Remote MCP Support
Source: https://docs.portkey.ai/docs/integrations/llms/anthropic/remote-mcp
# Anyscale
Source: https://docs.portkey.ai/docs/integrations/llms/anyscale-llama2-mistral-zephyr
Integrate Anyscale endpoints with Portkey seamlessly and make your OSS models production-ready
Portkey's suite of features - AI gateway, observability, prompt management, and continuous fine-tuning are all enabled for the OSS models (Llama2, Mistral, Zephyr, and more) available on Anyscale endpoints.
### Using Prompts
Deploy the prompts using the Portkey SDK or REST API
### Portkey Features
Portkey supports the complete host of it's functionality via the OpenAI SDK so you don't need to migrate away from it.
Please find more information in the relevant sections:
1. [Add metadata to your requests](/product/observability/metadata)
2. [Add gateway configs to the client or a single request](/product/ai-gateway/configs)
3. [Trace Anyscale requests](/product/observability/traces)
4. [Setup a fallback to Azure OpenAI](/product/ai-gateway/fallbacks)
# AWS SageMaker
Source: https://docs.portkey.ai/docs/integrations/llms/aws-sagemaker
Route to your AWS Sagemaker models through Portkey
Sagemaker allows users to host any ML model on their own AWS infrastructure.
With portkey you can manage/restrict access, log requests, and more.
### Step 2: Configure Integration Details
Fill in the basic information for your integration:
* **Name**: A descriptive name for this integration (e.g., "Azure AI Production")
* **Short Description**: Optional context about this integration's purpose
* **Slug**: A unique identifier used in API calls (e.g., "azure-ai-prod")
### Step 3: Set Up Authentication
Portkey supports three authentication methods for Azure AI Foundry. For most use cases, we recommend using the **Default (API Key)** method.
1. Navigate to your model deployment in Azure AI Foundry
2. Click on the deployment to view details
3. Copy the **API Key** from the authentication section
4. Copy the **Target URI** - this is your endpoint URL
5. Note the **API Version** from your deployment URL
6. **Azure Deployment Name** (Optional): Only required for Managed Services deployments
#### Enter Credentials in Portkey
#### Configure Your Model
Enter the following details for your Azure deployment:
**Model Slug**: Use your Azure Model Deployment name exactly as it appears in Azure AI Foundry
**Short Description**: Optional description for team reference
**Model Type**: Select "Custom model"
**Base Model**: Choose the model that matches your deployment's API structure (e.g., select `gpt-4` for GPT-4 deployments)
## Step 2: Configure Integration Details
Fill in the basic information for your integration:
* **Name**: A descriptive name for this integration (e.g., "Azure OpenAI Production")
* **Short Description**: Optional context about this integration's purpose
* **Slug**: A unique identifier used in API calls (e.g., "azure-openai-prod")
## Step 3: Set Up Authentication
Portkey supports three authentication methods for Azure OpenAI. For most use cases, we recommend using the **Default (API Key)** method.
### Gather Your Azure Credentials
From your Azure portal, you'll need to collect:
### Enter Credentials in Portkey
1. Navigate to your model deployment in Azure
2. Click on the deployment to view details
3. Copy the **API Key** from the authentication section
Follow the same steps as above for each additional model deployment.
### 1. Install the Portkey SDK
Add the Portkey SDK to your application to interact with Azure OpenAI's API through Portkey's gateway.
Log view for an image generation request on Azure OpenAI
More information on image generation is available in the [API Reference](https://portkey.ai/docs/api-reference/completions-1#create-image).
***
## Making Requests Without Model Catalog
Here's how you can pass your Azure OpenAI details & secrets directly without using the Model Catalog feature.
### Key Mapping
In a typical Azure OpenAI request,
```sh theme={"system"}
curl https://{YOUR_RESOURCE_NAME}.openai.azure.com/openai/deployments/{YOUR_DEPLOYMENT_NAME}/chat/completions?api-version={API_VERSION} \
-H "Content-Type: application/json" \
-H "api-key: {YOUR_API_KEY}" \
-d '{
"model": "gpt-4o",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant"
},
{
"role": "user",
"content": "what is a portkey?"
}
]
}'
```
| Parameter | Node SDK | Python SDK | REST Headers |
| --------------------- | ----------------------------------- | ------------------------------------ | ----------------------------- |
| AZURE RESOURCE NAME | azureResourceName | azure\_resource\_name | x-portkey-azure-resource-name |
| AZURE DEPLOYMENT NAME | azureDeploymentId | azure\_deployment\_id | x-portkey-azure-deployment-id |
| API VERSION | azureApiVersion | azure\_api\_version | x-portkey-azure-api-version |
| AZURE API KEY | Authorization: "Bearer + {API_KEY}" | Authorization = "Bearer + {API_KEY}" | Authorization |
| AZURE MODEL NAME | azureModelName | azure\_model\_name | x-portkey-azure-model-name |
### Example
* On the same [page](https://us-east-1.console.aws.amazon.com/iam/home#/security%5Fcredentials) under the '**Access keys'** section, where you created your Secret Access key, you will also find your **Access Key ID.**
* And lastly, get Your `AWS Region` from the Home Page of[ AWS Bedrock](https://us-east-1.console.aws.amazon.com/bedrock/home?region=us-east-1#/overview) as shown in the image below.
***
## Next Steps
The complete list of features supported in the SDK are available on the link below.
In addition to the `user` parameter, Portkey allows you to send arbitrary custom metadata with your requests. This powerful feature enables you to associate additional context or information with each request, which can be useful for analysis, debugging, or other custom use cases.
strict\_open\_ai\_compliance to false to use the Computer Use tool.
1. Navigate to [Virtual Keys](https://app.portkey.ai/virtual-keys) in your Portkey dashboard
2. Click **"Add Key"** and enable the **"Local/Privately hosted provider"** toggle
3. Configure your deployment:
* Select the matching provider API specification (typically `OpenAI`)
* Enter your model's base URL in the `Custom Host` field
* Add required authentication headers and their values
4. Click **"Create"** to generate your virtual key
You can now use this virtual key in your requests:
In addition to the `user` parameter, Portkey allows you to send arbitrary custom metadata with your requests. This powerful feature enables you to associate additional context or information with each request, which can be useful for analysis, debugging, or other custom use cases.
In addition to the `user` parameter, Portkey allows you to send arbitrary custom metadata with your requests. This powerful feature enables you to associate additional context or information with each request, which can be useful for analysis, debugging, or other custom use cases.
1. Navigate to [Virtual Keys](https://app.portkey.ai/virtual-keys) in your Portkey dashboard
2. Click **"Add Key"** and enable the **"Local/Privately hosted provider"** toggle
3. Configure your deployment:
* Select the matching provider API specification (typically `OpenAI`)
* Enter your model's base URL in the `Custom Host` field
* Add required authentication headers and their values
4. Click **"Create"** to generate your virtual key
You can now use this virtual key in your requests:
## Codestral v/s Mistral API Endpoint
Here's a handy guide for when you might want to make your requests to the Codestral endpoint v/s the original Mistral API endpoint:
[For more, check out Mistral's Code Generation guide here](https://docs.mistral.ai/capabilities/code%5Fgeneration/#operation/listModels).
***
## Managing Mistral AI Prompts
You can manage all prompts to Mistral AI in the [Prompt Library](/product/prompt-library). All the current models of Mistral AI are supported and you can easily start testing different prompts.
Once you're ready with your prompt, you can use the `portkey.prompts.completions.create` interface to use the prompt in your application.
### Mistral Tool Calling
Tool calling feature lets models trigger external tools based on conversation context. You define available functions, the model chooses when to use them, and your application executes them and returns results.
Portkey supports Mistral Tool Calling and makes it interoperable across multiple providers. With Portkey Prompts, you can templatize various your prompts & tool schemas as well.
### 2. Install the Portkey SDK and Initialize with Nomic
Add the Portkey SDK to your application to interact with Nomic's API through Portkey's gateway.
1. Navigate to [Virtual Keys](https://app.portkey.ai/virtual-keys) in your Portkey dashboard
2. Click **"Add Key"** and enable the **"Local/Privately hosted provider"** toggle
3. Configure your deployment:
* Select the matching provider API specification (typically `OpenAI`)
* Enter your model's base URL in the `Custom Host` field
* Add required authentication headers and their values
4. Click **"Create"** to generate your virtual key
You can now use this virtual key in your requests:
In addition to the `user` parameter, Portkey allows you to send arbitrary custom metadata with your requests. This powerful feature enables you to associate additional context or information with each request, which can be useful for analysis, debugging, or other custom use cases.
2. Use this prompt in your codebase using the Portkey SDK.
More information on image generation is available in the [API Reference](/provider-endpoints/images/create-image#create-image).
### Audio - Transcription, Translation, and Text-to-Speech
Portkey's multimodal Gateway also supports the `audio` methods on OpenAI API. Check out the below guides for more info:
Check out the below guides for more info:
All requests, including those with fewer than 1024 tokens, will display a `cached_tokens` field of the `usage.prompt_tokens_details` [chat completions object](https://platform.openai.com/docs/api-reference/chat/object) indicating how many of the prompt tokens were a cache hit.
For requests under 1024 tokens, `cached_tokens` will be zero.
cached\_tokens field of the usage.prompt\_tokens\_details
strict\_open\_ai\_compliance to false.
### Get Your Service Account JSON
* [Follow this process](https://cloud.google.com/iam/docs/keys-create-delete) to get your Service Account JSON.
When selecting Service Account File as your authentication method, you'll need to:
1. Upload your Google Cloud service account JSON file
2. Specify the Vertex Region
This method is particularly important for using self-deployed models, as your service account must have the `aiplatform.endpoints.predict` permission to access custom endpoints.
Learn more about permission on your Vertex IAM key [here](https://cloud.google.com/vertex-ai/docs/general/iam-permissions).
1. Navigate to [Virtual Keys](https://app.portkey.ai/virtual-keys) in your Portkey dashboard
2. Click **"Add Key"** and enable the **"Local/Privately hosted provider"** toggle
3. Configure your deployment:
* Select the matching provider API specification (typically `OpenAI`)
* Enter your model's base URL in the `Custom Host` field
* Add required authentication headers and their values
4. Click **"Create"** to generate your virtual key
You can now use this virtual key in your requests:
With the ability to create and manage multiple organizations, you can tailor access control to match your company's structure and project requirements. Users can be assigned to specific organizations, and they can seamlessly switch between them using Portkey's intuitive user interface.
## 2. Fine-Grained User Roles and Permissions
Portkey offers a comprehensive Role-Based Access Control (RBAC) system that allows you to define and assign user roles with granular permissions. By default, Portkey provides three roles: `Owner`, `Admin`, and `Member`, each with a predefined set of permissions across various features.
* `Owners` have complete control over the organization, including user management, billing, and all platform features.
* `Admins` have elevated privileges, allowing them to manage users, prompts, configs, guardrails, integrations, providers, and API keys.
* `Members` have access to essential features like logs, analytics, prompts, configs, and integrations, providers, with limited permissions.
| Feature | Owner Role | Admin Role | Member Role |
| ------------------------ | ------------------------------------------- | ------------------------------------------- | -------------------------- |
| Logs and Analytics | View, Filter, Group | View, Filter, Group | View, Filter, Group |
| Prompts | List, View, Create, Update, Delete, Publish | List, View, Create, Update, Delete, Publish | List, View, Create, Update |
| Configs | List, View, Create, Update, Delete | List, View, Create, Update, Delete | List, View, Create |
| Guardrails | List, View, Create, Update, Delete | List, View, Create, Update, Delete | List, View, Create, Update |
| Integrations | List, Create, Edit, Duplicate, Delete, Copy | List, Create, Edit, Duplicate, Delete, Copy | List, Copy |
| Model Catalog: Providers | List, Create, Edit, Duplicate, Delete, Copy | List, Create, Edit, Duplicate, Delete, Copy | List, Copy |
| Model Catalog: Models | List, Create, Edit, Duplicate, Delete, Copy | List, Create, Edit, Duplicate, Delete, Copy | List, Copy |
| Team | Add users, assign roles | Add users, assign roles | - |
| Organisation | Update | Update | - |
| API Keys | Create, Edit, Delete, Update, Rotate | Create, Edit, Delete, Update, Rotate | - |
| Billing | Manage | - | - |
You can easily add team members to your organization and assign them appropriate roles based on their responsibilities. Portkey's user-friendly interface simplifies the process of inviting users and managing their roles, ensuring that the right people have access to the right resources.
## 3. Secure and Customizable API Key Management
Portkey provides a secure and flexible API key management system that allows you to create and manage multiple API keys with fine-grained permissions. Each API key can be customized to grant specific access levels to different features, such as metrics, completions, prompts, configs, guardrails, integrations, providers, team management, and API key management.
| Feature | Permissions | Default |
| --------------------------- | ----------------------------- | -------- |
| Metrics | Disabled, Enabled | Disabled |
| Completions (all LLM calls) | Disabled, Enabled | Enabled |
| Prompts | Disabled, Read, Write, Delete | Read |
| Configs | Disabled, Read, Write, Delete | Disabled |
| Guardrails | Disabled, Read, Write, Delete | Disabled |
| Integrations | Disabled, Read, Write, Delete | Disabled |
| Model Catalog: Providers | Disabled, Read, Write, Delete | Disabled |
| Model Catalog: Models | Disabled, Read, Write, Delete | Disabled |
| Users (Team Management) | Disabled, Read, Write, Delete | Disabled |
By default, a new organization is provisioned with a master API key that has all permissions enabled. Owners and admins can edit and manage these keys, as well as create new API keys with tailored permissions. This granular control enables you to enforce the principle of least privilege, ensuring that each API key has access only to the necessary resources.
Portkey's API key management system provides a secure and auditable way to control access to your organization's data and resources, reducing the risk of unauthorized access and data breaches.
## Audit Logs
Portkey maintains detailed audit logs that capture all administrative activities across the platform. These logs provide visibility into actions related to prompts, configs, guardrails, integrations, providers, team management, organization updates, and API key modifications.
Each log entry includes information about the user, the action performed, the affected resource, and a timestamp. This ensures traceability and accountability, helping teams monitor changes and investigate any unauthorized activity.
Audit logs can be filtered by user, action type, resource, and time range, making it easy to track specific events. Organizations can use this data to enforce security policies, ensure compliance, and maintain operational integrity.
Portkey’s audit logging system provides a clear and structured way to review platform activity, ensuring security and compliance across all operations.
# OTel Integration (Analytics Data)
Source: https://docs.portkey.ai/docs/product/enterprise-offering/analytics-logs-export
Portkey supports sending your Analytics data to OpenTelemetry (OTel) compatible collectors, allowing you to integrate Portkey's analytics with your existing observability infrastructure.
## Overview
While Portkey leverages Clickhouse as the primary Analytics Store for the Control Panel by default, enterprise customers can integrate Portkey's analytics data with their existing data infrastructure through OpenTelemetry.
## Configuration
Portkey supports pushing your analytics data to an OTEL compatible endpoint. The following environment variables are needed for pushing to OTEL:
```yaml theme={"system"}
OTEL_PUSH_ENABLED: true
OTEL_ENDPOINT: http://localhost:4318
```
Additionally, you can configure arbitrary resource attributes of the OTEL logs by setting a comma-separated value for `OTEL_RESOURCE_ATTRIBUTES`:
```
OTEL_RESOURCE_ATTRIBUTES: ApplicationShortName=gateway,AssetId=12323,deployment.service=production
```
## Integration Options
Enterprise customers commonly use these analytics exports with:
* **Datadog**: Monitor and analyze your AI operations alongside other application metrics
* **AWS S3**: Store analytics data for long-term retention and analysis
* **Other OTEL-compatible systems**: Any system that can ingest OpenTelemetry data can be used with this feature
## Use Cases
This feature enables:
* Centralized observability across your entire tech stack
* Long-term storage of analytics data
* Custom analytics dashboards in your preferred tools
* Integration with existing alerting systems
## Getting Support
For additional assistance with setting up analytics data export:
* Join our [Discord community](https://portkey.sh/reddit-discord)
* Email us at [support@portkey.ai](mailto:support@portkey.ai)
Our team can help you with best practices for configuring your OTEL collectors and integrating with your existing systems.
# Audit Logs
Source: https://docs.portkey.ai/docs/product/enterprise-offering/audit-logs
Track and monitor all administrative activities across your Portkey organization with comprehensive audit logging.
## Key Benefits
* **Enhanced Security**: Track all changes to your organization's resources and configurations
* **Compliance Support**: Maintain detailed records to help meet regulatory requirements
* **Operational Visibility**: Understand who is making changes and when
* **Troubleshooting**: Investigate issues by reviewing recent configuration changes
* **Accountability**: Ensure users are responsible for their actions within the platform
## Logged Information
Each audit log entry contains detailed information about administrative activities:
| Field | Description |
| --------------- | ----------------------------------------------------------------------- |
| Timestamp | Date and time when the action occurred |
| User | The individual who performed the action |
| Workspace | The workspace context in which the action was performed (if applicable) |
| Action | The type of operation performed (create, update, delete, etc.) |
| Resource | The specific resource or entity that was affected |
| Response Status | HTTP status code indicating the result of the action |
| Client IP | IP address from which the request originated |
| Country | Geographic location associated with the request |
### Available Filters
* **Method**: Filter by HTTP method (PUT, POST, DELETE)
* **Request ID**: Search for a specific request by its unique identifier
* **Resource Type**: Filter by type of resource affected:
* Workspaces
* API Keys
* Integrations
* Model Catalog
* Configs
* Prompts
* Guardrails
* Integrations
* Collections
* Organization
* Labels
* Custom Resource Types
* **Action**: Filter by the type of action performed:
* Create
* Update
* Delete
* Publish
* Export
* Rotate
* Manage
* Duplicate
* **Response Status**: Filter by HTTP response status codes
* **Workspace**: Filter by specific workspace
* **User**: Filter by the user who performed the action
* **Client IP**: Filter by originating IP address
* **Country**: Filter by geographic location of requests
* **Time Range**: Filter logs within a specific time period
## Enterprise Features
Portkey's Audit Logs include enterprise-grade capabilities:
### 1. Complete Visibility
* Full user attribution for every action
* Detailed timestamps and change history
* Cross-workspace tracking
* Searchable audit trail
### 2. Compliance & Security
* SOC 2, ISO 27001, GDPR, and HIPAA compliant
* PII data protection
* Indefinite log retention
### 3. Enterprise-Grade Features
* Role-based access control
* Cross-organization visibility
* Custom retention policies
> As a Director of AI Infrastructure at a Fortune 100 Healthcare company explained: *"Having a detailed audit trail isn't just about compliance. It's about being able to debug production issues quickly, understand usage patterns, and make data-driven decisions about our AI infrastructure."*
## Related Features
### Privacy Protections
At Portkey AI, we prioritize user privacy. Our privacy protocols are designed to comply with international data protection regulations, ensuring that all data is handled responsibly.
We engage in minimal data retention and deploy advanced anonymization technologies to protect personal information and sensitive data from being exposed or improperly used.
Read our privacy policy here - [https://portkey.ai/privacy](https://portkey.ai/privacy)[-policy](https://portkey.ai/privacy-policy)
## System Integrity and Reliability
### **Network and System Security**:
We protect our systems with advanced firewall technologies and DDoS prevention mechanisms to thwart a wide range of online threats. Our security measures are designed to shield our infrastructure from malicious attacks and ensure continuous service availability.
### **Reliability and Availability**:
Portkey AI offers an industry-leading [99.995% uptime](https://status.portkey.ai), supported by a global network of 310 data centers.
This extensive distribution allows for effective load balancing and edge deployments, minimizing latency and ensuring fast, reliable service delivery across geographical locations.
Our failover mechanisms are sophisticated, designed to handle unexpected scenarios seamlessly and without service interruption.
## Incident Management and Continuous Improvement
### Incident Response
Our proactive incident response team is equipped with the tools and procedures necessary to quickly address and resolve security incidents.
This includes comprehensive risk assessments, immediate containment actions, and detailed investigations to prevent future occurrences.
We maintain transparent communication with our clients throughout the incident management process. Please review our [status page](https://status.portkey.ai) for incident reports.
### Updates and Continuous Improvement
Security at Portkey AI is dynamic; we continually refine our security measures and systems to address emerging threats and incorporate best practices. Our ongoing commitment to improvement helps us stay ahead of the curve in cybersecurity and operational performance.
## Contact Information
For more detailed information or specific inquiries regarding our security measures, please reach out to our support team:
* **Email**: [support@portkeyai.com](mailto:support@portkeyai.com), [dpo@portkey.ai](mailto:dpo@portkey.ai)
## Useful Links
[Privacy Policy](https://portkey.ai/privacy-policy)
[Terms of Service](https://portkey.ai/terms)
[Data Processing Agreement](https://portkey.ai/dpa)
[Trust Portal](https://trust.portkey.ai)
# Connect Bedrock with Amazon Assumed Role
Source: https://docs.portkey.ai/docs/product/model-catalog/connect-bedrock-with-amazon-assumed-role
How to create an integrate Bedrock using Amazon Assumed Role Authentication on Portkey
## Create an AWS Role for Portkey to Assume
This role you create will be used by Porktey to execute InvokeModel commands on Bedrock models in your AWS account. The setup process will establish a minimal-permission ("least privilege") role and set it up to allow Porktey to assume this role.
### Create a permission policy in your AWS account using the following JSON
```json theme={"system"}
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "BedrockConsole",
"Effect": "Allow",
"Action": [
"bedrock:InvokeModel",
"bedrock:InvokeModelWithResponseStream"
],
"Resource": "*"
}
]
}
```
### Create a new IAM role
Choose *AWS account* as the trusted entity type. If you set an external ID be sure to copy it, we will need it later.
### Add the above policy to the role
Search for the policy you created above and add it to the role.
### Configure Trust Relationship for the role
Once the role is created, open the role and navigate to the *Trust relationships* tab and click *Edit trust policy*.
This is where you will add the Portkey AWS account as a trusted entity.
```sh Portkey Account ARN theme={"system"}
arn:aws:iam::299329113195:role/portkey-app
```
You're all set! You can now use the new provider to invoke Bedrock models.
# Open Source
Source: https://docs.portkey.ai/docs/product/open-source
## [Portkey AI Gateway](https://github.com/portkey-ai/rubeus)
We have open sourced our battle-tested AI Gateway to the community - it connects to 250+ LLMs with a unified interface and a single endpoint, and lets you effortlessly setup fallbacks, load balancing, retries, and more.
This gateway is in production at Portkey processing billions of tokens every day.
#### [Contribute here](https://github.com/portkey-ai/rubeus).
***
## [AI Grants Finder](https://grantsfinder.portkey.ai/)
Community resource for AI builders to find `GPU credits`, `grants`, `AI accelerators`, or `investments` - all in a single place. Continuously updated, and sometimes also featuring [exclusive deals](https://twitter.com/PortkeyAI/status/1692463628514156859).
Access the data [here](https://airtable.com/appUjtBcdLQIgusqW/shrAU1e4M5twTmRal).
***
## [Gateway Reports](https://portkey.ai/blog/tag/benchmarks/)
We collaborate with the community to dive deep into how the LLMs & their inference providers are performing at scale, and publish gateway reports. We track latencies, uptime, cost changes, fluctuations across various modalitites like time-of-day, regions, token-lengths, and more.
#### [2025 AI Infrastructure Benchmark Report](https://portkey.ai/llms-in-prod-25)
Insights from analyzing 2 trillion+ tokens, across 90+ regions and 650+ teams in production. The report contains:
* Trends shaping AI adoption and LLM provider growth.
* Benchmarks to optimize speed, cost and reliability.
* Strategies to scale production-grade AI systems.
***
## Collaborations
Portkey supports various open source projects with additional production capabilities through its custom integrations.
## Components and Sizing Recommendations
| Component | Options | Sizing Recommendations |
| ------------------------------------ | ------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------- |
| AI Gateway | Deploy as a Docker container in your Kubernetes cluster using Helm Charts | AWS NodeGroup t4g.medium instance, with at least 4GiB of memory and two vCPUs For high reliability, deploy across multiple Availability Zones. |
| Logs store | AWS S3 | Each log document is \~10kb in size (uncompressed) |
| Cache (Prompts, Configs & Providers) | Elasticache or self-hosted Redis | Deploy in the same VPC as the Portkey Gateway. |
## Helm Chart
This deployment uses the Portkey AI hybrid Helm chart to deploy the Portkey AI Gateway. You can find more information about the Helm chart in the [Portkey AI Helm Chart GitHub Repository](https://github.com/Portkey-AI/helm/blob/main/charts/portkey-gateway/README.md).
## Prerequisites
1. Create a Portkey account on [Portkey AI](https://app.portkey.ai)
2. Portkey team will share the credentials for the private Docker registry.
## Markeplace Listing
### Visit Portkey AI AWS Marketplace Listing
You can find the Portkey AI AWS Marketplace listing [here](https://aws.amazon.com/marketplace/pp/prodview-o2leb4xcrkdqa).
### Subscribe to Portkey AI Enterprise Edition
Subscribe to the Portkey AI Enterprise Edition to gain access to the Portkey AI Gateway.
### Quick Launch
Upon subscribing to the Portkey AI Enterprise Edition, you will be able to select Quick Launch from within your AWS Console Subscriptions.
### Launch the Cloud Formation Template
Select the Portkey AI Enterprise Edition and click on Quick Launch.
### Run the Cloud Formation Template
Fill the required parameters and click on Next and run the Cloud Formation Template.
## Cloud Formation Steps
* Creates a new EKS cluster and NodeGroup in your selected VPC and Subnets
* Sets up IAM Roles needed for S3 bucket access using STS and Lambda execution
* Uses AWS Lambda to:
* Install the Portkey AI Helm chart to your EKS cluster
* Upload the values.yaml file to the S3 bucket
* Allows for changes to the values file or helm chart deployment by updating and re-running the same Lambda function in your AWS account
### Cloudformation Template
> Our cloudformation template has passed the AWS Marketplace validation and security review.
```yaml portkey-hybrid-eks-cloudformation.template.yaml [expandable] theme={"system"}
AWSTemplateFormatVersion: "2010-09-09"
Description: Portkey deployment template for AWS Marketplace
Metadata:
AWS::CloudFormation::Interface:
ParameterGroups:
- Label:
default: "Required Parameters"
Parameters:
- VPCID
- Subnet1ID
- Subnet2ID
- ClusterName
- NodeGroupName
- NodeGroupInstanceType
- SecurityGroupID
- CreateNewCluster
- HelmChartVersion
- PortkeyDockerUsername:
NoEcho: true
- PortkeyDockerPassword:
NoEcho: true
- PortkeyClientAuth:
NoEcho: true
- Label:
default: "Optional Parameters"
Parameters:
- PortkeyOrgId
- PortkeyGatewayIngressEnabled
- PortkeyGatewayIngressSubdomain
- PortkeyFineTuningEnabled
Parameters:
# Required Parameters
VPCID:
Type: AWS::EC2::VPC::Id
Description: VPC where the EKS cluster will be created
Default: Select a VPC
Subnet1ID:
Type: AWS::EC2::Subnet::Id
Description: First subnet ID for EKS cluster
Default: Select your subnet
Subnet2ID:
Type: AWS::EC2::Subnet::Id
Description: Second subnet ID for EKS cluster
Default: Select your subnet
# Optional Parameters with defaults
ClusterName:
Type: String
Description: Name of the EKS cluster (if not provided, a new EKS cluster will be created)
Default: portkey-eks-cluster
NodeGroupName:
Type: String
Description: Name of the EKS node group (if not provided, a new EKS node group will be created)
Default: portkey-eks-cluster-node-group
NodeGroupInstanceType:
Type: String
Description: EC2 instance type for the node group (if not provided, t3.medium will be used)
Default: t3.medium
AllowedValues:
- t3.medium
- t3.large
- t3.xlarge
PortkeyDockerUsername:
Type: String
Description: Docker username for Portkey (provided by the Portkey team)
Default: portkeyenterprise
PortkeyDockerPassword:
Type: String
Description: Docker password for Portkey (provided by the Portkey team)
Default: ""
NoEcho: true
PortkeyClientAuth:
Type: String
Description: Portkey Client ID (provided by the Portkey team)
Default: ""
NoEcho: true
PortkeyOrgId:
Type: String
Description: Portkey Organisation ID (provided by the Portkey team)
Default: ""
HelmChartVersion:
Type: String
Description: Version of the Helm chart to deploy
Default: "latest"
AllowedValues:
- latest
SecurityGroupID:
Type: String
Description: Optional security group ID for the EKS cluster (if not provided, a new security group will be created)
Default: ""
CreateNewCluster:
Type: String
AllowedValues: [true, false]
Default: true
Description: Whether to create a new EKS cluster or use an existing one
PortkeyGatewayIngressEnabled:
Type: String
AllowedValues: [true, false]
Default: false
Description: Whether to enable the Portkey Gateway ingress
PortkeyGatewayIngressSubdomain:
Type: String
Description: Subdomain for the Portkey Gateway ingress
Default: ""
PortkeyFineTuningEnabled:
Type: String
AllowedValues: [true, false]
Default: false
Description: Whether to enable the Portkey Fine Tuning
Conditions:
CreateSecurityGroup: !Equals [!Ref SecurityGroupID, ""]
ShouldCreateCluster: !Equals [!Ref CreateNewCluster, true]
Resources:
PortkeyAM:
Type: AWS::IAM::Role
DeletionPolicy: Delete
Properties:
RoleName: PortkeyAM
AssumeRolePolicyDocument:
Version: "2012-10-17"
Statement:
- Effect: Allow
Principal:
AWS: !Sub "arn:aws:iam::${AWS::AccountId}:root"
Action: sts:AssumeRole
Policies:
- PolicyName: PortkeyEKSAccess
PolicyDocument:
Version: "2012-10-17"
Statement:
- Effect: Allow
Action:
- "eks:DescribeCluster"
- "eks:ListClusters"
- "eks:ListNodegroups"
- "eks:ListFargateProfiles"
- "eks:ListNodegroups"
- "eks:CreateCluster"
- "eks:CreateNodegroup"
- "eks:DeleteCluster"
- "eks:DeleteNodegroup"
- "eks:UpdateClusterConfig"
- "eks:UpdateKubeconfig"
Resource: !Sub "arn:aws:eks:${AWS::Region}:${AWS::AccountId}:cluster/${ClusterName}"
- Effect: Allow
Action:
- "sts:AssumeRole"
Resource: !Sub "arn:aws:iam::${AWS::AccountId}:role/PortkeyAM"
- Effect: Allow
Action:
- "sts:GetCallerIdentity"
Resource: "*"
- Effect: Allow
Action:
- "iam:ListRoles"
- "iam:GetRole"
Resource: "*"
- Effect: Allow
Action:
- "bedrock:InvokeModel"
- "bedrock:InvokeModelWithResponseStream"
Resource: "*"
- Effect: Allow
Action:
- "s3:GetObject"
- "s3:PutObject"
Resource:
- !Sub "arn:aws:s3:::${AWS::AccountId}-${AWS::Region}-portkey-logs/*"
PortkeyLogsBucket:
Type: AWS::S3::Bucket
DeletionPolicy: Delete
Properties:
BucketName: !Sub "${AWS::AccountId}-${AWS::Region}-portkey-logs"
VersioningConfiguration:
Status: Enabled
PublicAccessBlockConfiguration:
BlockPublicAcls: true
BlockPublicPolicy: true
IgnorePublicAcls: true
RestrictPublicBuckets: true
BucketEncryption:
ServerSideEncryptionConfiguration:
- ServerSideEncryptionByDefault:
SSEAlgorithm: AES256
# EKS Cluster Role
EksClusterRole:
Type: AWS::IAM::Role
DeletionPolicy: Delete
Properties:
RoleName: EksClusterRole-Portkey
AssumeRolePolicyDocument:
Version: "2012-10-17"
Statement:
- Effect: Allow
Principal:
Service: eks.amazonaws.com
Action: sts:AssumeRole
ManagedPolicyArns:
- arn:aws:iam::aws:policy/AmazonEKSClusterPolicy
# EKS Cluster Security Group (if not provided)
EksSecurityGroup:
Type: AWS::EC2::SecurityGroup
Condition: CreateSecurityGroup
DeletionPolicy: Delete
Properties:
GroupDescription: Security group for Portkey EKS cluster
VpcId: !Ref VPCID
SecurityGroupIngress:
- IpProtocol: tcp
FromPort: 8787
ToPort: 8787
CidrIp: PORTKEY_IP
SecurityGroupEgress:
- IpProtocol: tcp
FromPort: 443
ToPort: 443
CidrIp: 0.0.0.0/0
# EKS Cluster
EksCluster:
Type: AWS::EKS::Cluster
Condition: ShouldCreateCluster
DeletionPolicy: Delete
DependsOn: EksClusterRole
Properties:
Name: !Ref ClusterName
Version: "1.32"
RoleArn: !GetAtt EksClusterRole.Arn
ResourcesVpcConfig:
SecurityGroupIds:
- !If
- CreateSecurityGroup
- !Ref EksSecurityGroup
- !Ref SecurityGroupID
SubnetIds:
- !Ref Subnet1ID
- !Ref Subnet2ID
AccessConfig:
AuthenticationMode: API_AND_CONFIG_MAP
LambdaExecutionRole:
Type: AWS::IAM::Role
DeletionPolicy: Delete
DependsOn: EksCluster
Properties:
RoleName: PortkeyLambdaRole
AssumeRolePolicyDocument:
Version: "2012-10-17"
Statement:
- Effect: Allow
Principal:
Service: lambda.amazonaws.com
Action: sts:AssumeRole
ManagedPolicyArns:
- arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
Policies:
- PolicyName: EKSAccess
PolicyDocument:
Version: "2012-10-17"
Statement:
- Effect: Allow
Action:
- ec2:DescribeInstances
- ec2:DescribeRegions
Resource: "*"
- Effect: Allow
Action:
- "sts:AssumeRole"
Resource: !GetAtt PortkeyAM.Arn
- Effect: Allow
Action:
- "s3:GetObject"
- "s3:PutObject"
Resource:
- !Sub "arn:aws:s3:::${AWS::AccountId}-${AWS::Region}-portkey-logs/*"
- Effect: Allow
Action:
- "eks:DescribeCluster"
- "eks:ListClusters"
- "eks:ListNodegroups"
- "eks:ListFargateProfiles"
- "eks:ListNodegroups"
- "eks:CreateCluster"
- "eks:CreateNodegroup"
- "eks:DeleteCluster"
- "eks:DeleteNodegroup"
- "eks:CreateFargateProfile"
- "eks:DeleteFargateProfile"
- "eks:DescribeFargateProfile"
- "eks:UpdateClusterConfig"
- "eks:UpdateKubeconfig"
Resource: !Sub "arn:aws:eks:${AWS::Region}:${AWS::AccountId}:cluster/${ClusterName}"
LambdaClusterAdmin:
Type: AWS::EKS::AccessEntry
DependsOn: EksCluster
Properties:
ClusterName: !Ref ClusterName
PrincipalArn: !GetAtt LambdaExecutionRole.Arn
Type: STANDARD
KubernetesGroups:
- system:masters
AccessPolicies:
- PolicyArn: "arn:aws:eks::aws:cluster-access-policy/AmazonEKSClusterAdminPolicy"
AccessScope:
Type: "cluster"
# Node Group Role
NodeGroupRole:
Type: AWS::IAM::Role
DeletionPolicy: Delete
Properties:
RoleName: NodeGroupRole-Portkey
AssumeRolePolicyDocument:
Version: "2012-10-17"
Statement:
- Effect: Allow
Principal:
Service: ec2.amazonaws.com
Action: sts:AssumeRole
ManagedPolicyArns:
- arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy
- arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy
- arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly
# EKS Node Group
EksNodeGroup:
Type: AWS::EKS::Nodegroup
DependsOn: EksCluster
DeletionPolicy: Delete
Properties:
CapacityType: ON_DEMAND
ClusterName: !Ref ClusterName
NodegroupName: !Ref NodeGroupName
NodeRole: !GetAtt NodeGroupRole.Arn
InstanceTypes:
- !Ref NodeGroupInstanceType
ScalingConfig:
MinSize: 1
DesiredSize: 1
MaxSize: 1
Subnets:
- !Ref Subnet1ID
- !Ref Subnet2ID
PortkeyInstallerFunction:
Type: AWS::Lambda::Function
DependsOn: EksNodeGroup
DeletionPolicy: Delete
Properties:
FunctionName: portkey-eks-installer
Runtime: nodejs18.x
Handler: index.handler
MemorySize: 1024
EphemeralStorage:
Size: 1024
Code:
ZipFile: |
const fs = require('fs');
const zlib = require('zlib');
const { pipeline } = require('stream');
const path = require('path');
const https = require('https');
const { promisify } = require('util');
const { execSync } = require('child_process');
const { EKSClient, DescribeClusterCommand } = require('@aws-sdk/client-eks');
async function unzipAwsCli(zipPath, destPath) {
// ZIP file format: https://en.wikipedia.org/wiki/ZIP_(file_format)
const data = fs.readFileSync(zipPath);
let offset = 0;
// Find end of central directory record
const EOCD_SIGNATURE = 0x06054b50;
for (let i = data.length - 22; i >= 0; i--) {
if (data.readUInt32LE(i) === EOCD_SIGNATURE) {
offset = i;
break;
}
}
// Read central directory info
const numEntries = data.readUInt16LE(offset + 10);
let centralDirOffset = data.readUInt32LE(offset + 16);
// Process each file
for (let i = 0; i < numEntries; i++) {
// Read central directory header
const signature = data.readUInt32LE(centralDirOffset);
if (signature !== 0x02014b50) {
throw new Error('Invalid central directory header');
}
const fileNameLength = data.readUInt16LE(centralDirOffset + 28);
const extraFieldLength = data.readUInt16LE(centralDirOffset + 30);
const fileCommentLength = data.readUInt16LE(centralDirOffset + 32);
const localHeaderOffset = data.readUInt32LE(centralDirOffset + 42);
// Get filename
const fileName = data.slice(
centralDirOffset + 46,
centralDirOffset + 46 + fileNameLength
).toString();
// Read local file header
const localSignature = data.readUInt32LE(localHeaderOffset);
if (localSignature !== 0x04034b50) {
throw new Error('Invalid local file header');
}
const localFileNameLength = data.readUInt16LE(localHeaderOffset + 26);
const localExtraFieldLength = data.readUInt16LE(localHeaderOffset + 28);
// Get file data
const fileDataOffset = localHeaderOffset + 30 + localFileNameLength + localExtraFieldLength;
const compressedSize = data.readUInt32LE(centralDirOffset + 20);
const uncompressedSize = data.readUInt32LE(centralDirOffset + 24);
const compressionMethod = data.readUInt16LE(centralDirOffset + 10);
// Create directory if needed
const fullPath = path.join(destPath, fileName);
const directory = path.dirname(fullPath);
if (!fs.existsSync(directory)) {
fs.mkdirSync(directory, { recursive: true });
}
// Extract file
if (!fileName.endsWith('/')) { // Skip directories
const fileData = data.slice(fileDataOffset, fileDataOffset + compressedSize);
if (compressionMethod === 0) { // Stored (no compression)
fs.writeFileSync(fullPath, fileData);
} else if (compressionMethod === 8) { // Deflate
const inflated = require('zlib').inflateRawSync(fileData);
fs.writeFileSync(fullPath, inflated);
} else {
throw new Error(`Unsupported compression method: ${compressionMethod}`);
}
}
// Move to next entry
centralDirOffset += 46 + fileNameLength + extraFieldLength + fileCommentLength;
}
}
async function extractTarGz(source, destination) {
// First, let's decompress the .gz file
const gunzip = promisify(zlib.gunzip);
console.log('Reading source file...');
const compressedData = fs.readFileSync(source);
console.log('Decompressing...');
const tarData = await gunzip(compressedData);
// Now we have the raw tar data
// Tar files are made up of 512-byte blocks
let position = 0;
while (position < tarData.length) {
// Read header block
const header = tarData.slice(position, position + 512);
position += 512;
// Get filename from header (first 100 bytes)
const filename = header.slice(0, 100)
.toString('utf8')
.replace(/\0/g, '')
.trim();
if (!filename) break; // End of tar
// Get file size from header (bytes 124-136)
const sizeStr = header.slice(124, 136)
.toString('utf8')
.replace(/\0/g, '')
.trim();
const size = parseInt(sizeStr, 8); // Size is in octal
console.log(`Found file: ${filename} (${size} bytes)`);
if (filename === 'linux-amd64/helm') {
console.log('Found helm binary, extracting...');
// Extract the file content
const content = tarData.slice(position, position + size);
// Write to destination
const outputPath = path.join(destination, 'helm');
fs.writeFileSync(outputPath, content);
console.log(`Helm binary extracted to: ${outputPath}`);
return; // We found what we needed
}
// Move to next file
position += size;
// Move to next 512-byte boundary
position += (512 - (size % 512)) % 512;
}
throw new Error('Helm binary not found in archive');
}
async function downloadFile(url, dest) {
return new Promise((resolve, reject) => {
const file = fs.createWriteStream(dest);
https.get(url, (response) => {
response.pipe(file);
file.on('finish', () => {
file.close();
resolve();
});
}).on('error', reject);
});
}
async function setupBinaries() {
const { STSClient, GetCallerIdentityCommand, AssumeRoleCommand } = require("@aws-sdk/client-sts");
const { SignatureV4 } = require("@aws-sdk/signature-v4");
const { defaultProvider } = require("@aws-sdk/credential-provider-node");
const crypto = require('crypto');
const tmpDir = '/tmp/bin';
if (!fs.existsSync(tmpDir)) {
fs.mkdirSync(tmpDir, { recursive: true });
}
console.log('Setting up AWS CLI...');
const awsCliUrl = 'https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip';
const awsZipPath = `${tmpDir}/awscliv2.zip`;
await unzipAwsCli(awsZipPath, tmpDir);
execSync(`chmod +x ${tmpDir}/aws/install ${tmpDir}/aws/dist/aws`);
execSync(`${tmpDir}/aws/install --update --install-dir /tmp/aws-cli --bin-dir /tmp/aws-bin`, { stdio: 'inherit' });
try {
await new Promise((resolve, reject) => {
const https = require('https');
const fs = require('fs');
const file = fs.createWriteStream('/tmp/kubectl');
const request = https.get('https://dl.k8s.io/release/v1.32.1/bin/linux/amd64/kubectl', response => {
if (response.statusCode === 302 || response.statusCode === 301) {
https.get(response.headers.location, redirectResponse => {
redirectResponse.pipe(file);
file.on('finish', () => {
file.close();
resolve();
});
}).on('error', err => {
fs.unlink('/tmp/kubectl', () => {});
reject(err);
});
return;
}
response.pipe(file);
file.on('finish', () => {
file.close();
resolve();
});
});
request.on('error', err => {
fs.unlink('/tmp/kubectl', () => {});
reject(err);
});
});
execSync('chmod +x /tmp/kubectl', {
stdio: 'inherit'
});
} catch (error) {
console.error('Error installing kubectl:', error);
throw error;
}
console.log('Setting up helm...');
const helmUrl = 'https://get.helm.sh/helm-v3.12.0-linux-amd64.tar.gz';
const helmTarPath = `${tmpDir}/helm.tar.gz`;
await downloadFile(helmUrl, helmTarPath);
await extractTarGz(helmTarPath, tmpDir);
execSync(`chmod +x ${tmpDir}/helm`);
fs.unlinkSync(helmTarPath);
process.env.PATH = `${tmpDir}:${process.env.PATH}`;
execSync(`/tmp/aws-bin/aws --version`);
}
exports.handler = async (event, context) => {
try {
const { CLUSTER_NAME, NODE_GROUP_NAME, CLUSTER_ARN, CHART_VERSION,
PORTKEY_AWS_REGION, PORTKEY_AWS_ACCOUNT_ID, PORTKEYAM_ROLE_ARN,
PORTKEY_DOCKER_USERNAME, PORTKEY_DOCKER_PASSWORD,
PORTKEY_CLIENT_AUTH, ORGANISATIONS_TO_SYNC } = process.env;
console.log(process.env)
if (!CLUSTER_NAME || !PORTKEY_AWS_REGION || !CHART_VERSION ||
!PORTKEY_AWS_ACCOUNT_ID || !PORTKEYAM_ROLE_ARN) {
throw new Error('Missing one or more required environment variables.');
}
await setupBinaries();
const awsCredentialsDir = '/tmp/.aws';
if (!fs.existsSync(awsCredentialsDir)) {
fs.mkdirSync(awsCredentialsDir, { recursive: true });
}
// Write AWS credentials file
const credentialsContent = `[default]
aws_access_key_id = ${process.env.AWS_ACCESS_KEY_ID}
aws_secret_access_key = ${process.env.AWS_SECRET_ACCESS_KEY}
aws_session_token = ${process.env.AWS_SESSION_TOKEN}
region = ${process.env.PORTKEY_AWS_REGION}
`;
fs.writeFileSync(`${awsCredentialsDir}/credentials`, credentialsContent);
// Write AWS config file
const configContent = `[default]
region = ${process.env.PORTKEY_AWS_REGION}
output = json
`;
fs.writeFileSync(`${awsCredentialsDir}/config`, configContent);
// Set AWS config environment variables
process.env.AWS_CONFIG_FILE = `${awsCredentialsDir}/config`;
process.env.AWS_SHARED_CREDENTIALS_FILE = `${awsCredentialsDir}/credentials`;
// Define kubeconfig path
const kubeconfigDir = `/tmp/${CLUSTER_NAME.trim()}`;
const kubeconfigPath = path.join(kubeconfigDir, 'config');
// Create the directory if it doesn't exist
if (!fs.existsSync(kubeconfigDir)) {
fs.mkdirSync(kubeconfigDir, { recursive: true });
}
console.log(`Updating kubeconfig for cluster: ${CLUSTER_NAME}`);
execSync(`/tmp/aws-bin/aws eks update-kubeconfig --name ${process.env.CLUSTER_NAME} --region ${process.env.PORTKEY_AWS_REGION} --kubeconfig ${kubeconfigPath}`, {
stdio: 'inherit',
env: {
...process.env,
HOME: '/tmp',
AWS_CONFIG_FILE: `${awsCredentialsDir}/config`,
AWS_SHARED_CREDENTIALS_FILE: `${awsCredentialsDir}/credentials`
}
});
// Set KUBECONFIG environment variable
process.env.KUBECONFIG = kubeconfigPath;
let kubeconfig = fs.readFileSync(kubeconfigPath, 'utf8');
// Replace the command line to use full path
kubeconfig = kubeconfig.replace(
'command: aws',
'command: /tmp/aws-bin/aws'
);
fs.writeFileSync(kubeconfigPath, kubeconfig);
// Setup Helm repository
console.log('Setting up Helm repository...');
await new Promise((resolve, reject) => {
try {
execSync(`helm repo add portkey-ai https://portkey-ai.github.io/helm`, {
stdio: 'inherit',
env: { ...process.env, HOME: '/tmp' }
});
resolve();
} catch (error) {
reject(error);
}
});
await new Promise((resolve, reject) => {
try {
execSync(`helm repo update`, {
stdio: 'inherit',
env: { ...process.env, HOME: '/tmp' }
});
resolve();
} catch (error) {
reject(error);
}
});
// Create values.yaml
const valuesYAML = `
replicaCount: 1
images:
gatewayImage:
repository: "docker.io/portkeyai/gateway_enterprise"
pullPolicy: IfNotPresent
tag: "1.9.0"
dataserviceImage:
repository: "docker.io/portkeyai/data-service"
pullPolicy: IfNotPresent
tag: "1.0.2"
imagePullSecrets: [portkeyenterpriseregistrycredentials]
nameOverride: ""
fullnameOverride: ""
imageCredentials:
- name: portkeyenterpriseregistrycredentials
create: true
registry: https://index.docker.io/v1/
username: ${PORTKEY_DOCKER_USERNAME}
password: ${PORTKEY_DOCKER_PASSWORD}
useVaultInjection: false
environment:
create: true
secret: true
data:
SERVICE_NAME: portkeyenterprise
PORT: "8787"
LOG_STORE: s3_assume
LOG_STORE_REGION: ${PORTKEY_AWS_REGION}
AWS_ROLE_ARN: ${PORTKEYAM_ROLE_ARN}
LOG_STORE_GENERATIONS_BUCKET: portkey-gateway
ANALYTICS_STORE: control_plane
CACHE_STORE: redis
REDIS_URL: redis://redis:6379
REDIS_TLS_ENABLED: "false"
PORTKEY_CLIENT_AUTH: ${PORTKEY_CLIENT_AUTH}
ORGANISATIONS_TO_SYNC: ${ORGANISATIONS_TO_SYNC}
serviceAccount:
create: true
automount: true
annotations: {}
name: ""
podAnnotations: {}
podLabels: {}
podSecurityContext: {}
securityContext: {}
service:
type: LoadBalancer
port: 8787
targetPort: 8787
protocol: TCP
additionalLabels: {}
annotations: {}
ingress:
enabled: ${PORTKEY_GATEWAY_INGRESS_ENABLED}
className: ""
annotations: {}
hosts:
- host: ${PORTKEY_GATEWAY_INGRESS_SUBDOMAIN}
paths:
- path: /
pathType: ImplementationSpecific
tls: []
resources: {}
livenessProbe:
httpGet:
path: /v1/health
port: 8787
initialDelaySeconds: 30
periodSeconds: 60
timeoutSeconds: 5
failureThreshold: 5
readinessProbe:
httpGet:
path: /v1/health
port: 8787
initialDelaySeconds: 30
periodSeconds: 60
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 5
autoscaling:
enabled: true
minReplicas: 1
maxReplicas: 10
targetCPUUtilizationPercentage: 80
volumes: []
volumeMounts: []
nodeSelector: {}
tolerations: []
affinity: {}
autoRestart: false
dataservice:
name: "dataservice"
enabled: ${PORTKEY_FINE_TUNING_ENABLED}
containerPort: 8081
finetuneBucket: ${PORTKEY_AWS_ACCOUNT_ID}-${PORTKEY_AWS_REGION}-portkey-logs
logexportsBucket: ${PORTKEY_AWS_ACCOUNT_ID}-${PORTKEY_AWS_REGION}-portkey-logs
deployment:
autoRestart: true
replicas: 1
labels: {}
annotations: {}
podSecurityContext: {}
securityContext: {}
resources: {}
startupProbe:
httpGet:
path: /health
port: 8081
initialDelaySeconds: 60
failureThreshold: 3
periodSeconds: 10
timeoutSeconds: 1
livenessProbe:
httpGet:
path: /health
port: 8081
failureThreshold: 3
periodSeconds: 10
timeoutSeconds: 1
readinessProbe:
httpGet:
path: /health
port: 8081
failureThreshold: 3
periodSeconds: 10
timeoutSeconds: 1
extraContainerConfig: {}
nodeSelector: {}
tolerations: []
affinity: {}
volumes: []
volumeMounts: []
service:
type: ClusterIP
port: 8081
labels: {}
annotations: {}
loadBalancerSourceRanges: []
loadBalancerIP: ""
serviceAccount:
create: true
name: ""
labels: {}
annotations: {}
autoscaling:
enabled: false
createHpa: false
minReplicas: 1
maxReplicas: 5
targetCPUUtilizationPercentage: 80`
// Write values.yaml
const valuesYamlPath = '/tmp/values.yaml';
fs.writeFileSync(valuesYamlPath, valuesYAML);
const { S3Client, PutObjectCommand, GetObjectCommand } = require("@aws-sdk/client-s3");
const s3Client = new S3Client({ region: process.env.PORTKEY_AWS_REGION });
try {
const response = await s3Client.send(new GetObjectCommand({
Bucket: `${process.env.PORTKEY_AWS_ACCOUNT_ID}-${process.env.PORTKEY_AWS_REGION}-portkey-logs`,
Key: 'values.yaml'
}));
const existingValuesYAML = await response.Body.transformToString();
console.log('Found existing values.yaml in S3, using it instead of default');
fs.writeFileSync(valuesYamlPath, existingValuesYAML);
} catch (error) {
if (error.name === 'NoSuchKey') {
// Upload the default values.yaml to S3
await s3Client.send(new PutObjectCommand({
Bucket: `${process.env.PORTKEY_AWS_ACCOUNT_ID}-${process.env.PORTKEY_AWS_REGION}-portkey-logs`,
Key: 'values.yaml',
Body: valuesYAML,
ContentType: 'text/yaml'
}));
console.log('Default values.yaml written to S3 bucket');
} else {
throw error;
}
}
// Install/upgrade Helm chart
console.log('Installing helm chart...');
await new Promise((resolve, reject) => {
try {
execSync(`helm upgrade --install portkey-ai portkey-ai/gateway -f ${valuesYamlPath} -n portkeyai --create-namespace --kube-context ${process.env.CLUSTER_ARN} --kubeconfig ${kubeconfigPath}`, {
stdio: 'inherit',
env: {
...process.env,
HOME: '/tmp',
PATH: `/tmp/aws-bin:${process.env.PATH}`
}
});
resolve();
} catch (error) {
reject(error);
}
});
return {
statusCode: 200,
body: JSON.stringify({
message: 'EKS installation and helm chart deployment completed successfully',
event: event
})
};
} catch (error) {
console.error('Error:', error);
return {
statusCode: 500,
body: JSON.stringify({
message: 'Error during EKS installation and helm chart deployment',
error: error.message
})
};
}
};
Role: !GetAtt LambdaExecutionRole.Arn
Timeout: 900
Environment:
Variables:
CLUSTER_NAME: !Ref ClusterName
NODE_GROUP_NAME: !Ref NodeGroupName
CLUSTER_ARN: !GetAtt EksCluster.Arn
CHART_VERSION: !Ref HelmChartVersion
PORTKEY_AWS_REGION: !Ref "AWS::Region"
PORTKEY_AWS_ACCOUNT_ID: !Ref "AWS::AccountId"
PORTKEYAM_ROLE_ARN: !GetAtt PortkeyAM.Arn
PORTKEY_DOCKER_USERNAME: !Ref PortkeyDockerUsername
PORTKEY_DOCKER_PASSWORD: !Ref PortkeyDockerPassword
PORTKEY_CLIENT_AUTH: !Ref PortkeyClientAuth
ORGANISATIONS_TO_SYNC: !Ref PortkeyOrgId
PORTKEY_GATEWAY_INGRESS_ENABLED: !Ref PortkeyGatewayIngressEnabled
PORTKEY_GATEWAY_INGRESS_SUBDOMAIN: !Ref PortkeyGatewayIngressSubdomain
PORTKEY_FINE_TUNING_ENABLED: !Ref PortkeyFineTuningEnabled
```
### Lambda Function
#### Steps
1. **Sets up required binaries** - Downloads and configures AWS CLI, kubectl, and Helm binaries in the Lambda environment to enable interaction with AWS services and Kubernetes.
2. **Configures AWS credentials** - Creates temporary AWS credential files in the Lambda environment to authenticate with AWS services.
3. **Connects to EKS cluster** - Updates the kubeconfig file to establish a connection with the specified Amazon EKS cluster.
4. **Manages Helm chart deployment** - Adds the Portkey AI Helm repository and deploys/upgrades the Portkey AI Gateway using Helm charts.
5. **Handles configuration values** - Creates a values.yaml file with environment-specific configurations and stores it in an S3 bucket for future reference or updates.
6. **Provides idempotent deployment** - Checks for existing configurations in S3 and uses them if available, allowing the function to be run multiple times for updates without losing custom configurations.
```javascript portkey-hybrid-eks-cloudformation.lambda.js [expandable] theme={"system"}
const fs = require('fs');
const zlib = require('zlib');
const { pipeline } = require('stream');
const path = require('path');
const https = require('https');
const { promisify } = require('util');
const { execSync } = require('child_process');
const { EKSClient, DescribeClusterCommand } = require('@aws-sdk/client-eks');
async function unzipAwsCli(zipPath, destPath) {
// ZIP file format: https://en.wikipedia.org/wiki/ZIP_(file_format)
const data = fs.readFileSync(zipPath);
let offset = 0;
// Find end of central directory record
const EOCD_SIGNATURE = 0x06054b50;
for (let i = data.length - 22; i >= 0; i--) {
if (data.readUInt32LE(i) === EOCD_SIGNATURE) {
offset = i;
break;
}
}
// Read central directory info
const numEntries = data.readUInt16LE(offset + 10);
let centralDirOffset = data.readUInt32LE(offset + 16);
// Process each file
for (let i = 0; i < numEntries; i++) {
// Read central directory header
const signature = data.readUInt32LE(centralDirOffset);
if (signature !== 0x02014b50) {
throw new Error('Invalid central directory header');
}
const fileNameLength = data.readUInt16LE(centralDirOffset + 28);
const extraFieldLength = data.readUInt16LE(centralDirOffset + 30);
const fileCommentLength = data.readUInt16LE(centralDirOffset + 32);
const localHeaderOffset = data.readUInt32LE(centralDirOffset + 42);
// Get filename
const fileName = data.slice(
centralDirOffset + 46,
centralDirOffset + 46 + fileNameLength
).toString();
// Read local file header
const localSignature = data.readUInt32LE(localHeaderOffset);
if (localSignature !== 0x04034b50) {
throw new Error('Invalid local file header');
}
const localFileNameLength = data.readUInt16LE(localHeaderOffset + 26);
const localExtraFieldLength = data.readUInt16LE(localHeaderOffset + 28);
// Get file data
const fileDataOffset = localHeaderOffset + 30 + localFileNameLength + localExtraFieldLength;
const compressedSize = data.readUInt32LE(centralDirOffset + 20);
const uncompressedSize = data.readUInt32LE(centralDirOffset + 24);
const compressionMethod = data.readUInt16LE(centralDirOffset + 10);
// Create directory if needed
const fullPath = path.join(destPath, fileName);
const directory = path.dirname(fullPath);
if (!fs.existsSync(directory)) {
fs.mkdirSync(directory, { recursive: true });
}
// Extract file
if (!fileName.endsWith('/')) { // Skip directories
const fileData = data.slice(fileDataOffset, fileDataOffset + compressedSize);
if (compressionMethod === 0) { // Stored (no compression)
fs.writeFileSync(fullPath, fileData);
} else if (compressionMethod === 8) { // Deflate
const inflated = require('zlib').inflateRawSync(fileData);
fs.writeFileSync(fullPath, inflated);
} else {
throw new Error(`Unsupported compression method: ${compressionMethod}`);
}
}
// Move to next entry
centralDirOffset += 46 + fileNameLength + extraFieldLength + fileCommentLength;
}
}
async function extractTarGz(source, destination) {
// First, let's decompress the .gz file
const gunzip = promisify(zlib.gunzip);
console.log('Reading source file...');
const compressedData = fs.readFileSync(source);
console.log('Decompressing...');
const tarData = await gunzip(compressedData);
// Now we have the raw tar data
// Tar files are made up of 512-byte blocks
let position = 0;
while (position < tarData.length) {
// Read header block
const header = tarData.slice(position, position + 512);
position += 512;
// Get filename from header (first 100 bytes)
const filename = header.slice(0, 100)
.toString('utf8')
.replace(/\0/g, '')
.trim();
if (!filename) break; // End of tar
// Get file size from header (bytes 124-136)
const sizeStr = header.slice(124, 136)
.toString('utf8')
.replace(/\0/g, '')
.trim();
const size = parseInt(sizeStr, 8); // Size is in octal
console.log(`Found file: ${filename} (${size} bytes)`);
if (filename === 'linux-amd64/helm') {
console.log('Found helm binary, extracting...');
// Extract the file content
const content = tarData.slice(position, position + size);
// Write to destination
const outputPath = path.join(destination, 'helm');
fs.writeFileSync(outputPath, content);
console.log(`Helm binary extracted to: ${outputPath}`);
return; // We found what we needed
}
// Move to next file
position += size;
// Move to next 512-byte boundary
position += (512 - (size % 512)) % 512;
}
throw new Error('Helm binary not found in archive');
}
async function downloadFile(url, dest) {
return new Promise((resolve, reject) => {
const file = fs.createWriteStream(dest);
https.get(url, (response) => {
response.pipe(file);
file.on('finish', () => {
file.close();
resolve();
});
}).on('error', reject);
});
}
async function setupBinaries() {
const { STSClient, GetCallerIdentityCommand, AssumeRoleCommand } = require("@aws-sdk/client-sts");
const { SignatureV4 } = require("@aws-sdk/signature-v4");
const { defaultProvider } = require("@aws-sdk/credential-provider-node");
const crypto = require('crypto');
const tmpDir = '/tmp/bin';
if (!fs.existsSync(tmpDir)) {
fs.mkdirSync(tmpDir, { recursive: true });
}
// Download and setup AWS CLI
console.log('Setting up AWS CLI...');
const awsCliUrl = 'https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip';
const awsZipPath = `${tmpDir}/awscliv2.zip`;
await downloadFile(awsCliUrl, awsZipPath);
// Extract using our custom unzip function
await unzipAwsCli(awsZipPath, tmpDir);
execSync(`chmod +x ${tmpDir}/aws/install ${tmpDir}/aws/dist/aws`);
// Install AWS CLI
execSync(`${tmpDir}/aws/install --update --install-dir /tmp/aws-cli --bin-dir /tmp/aws-bin`, { stdio: 'inherit' });
// Download and setup kubectl
try {
// Download kubectl binary using Node.js https
await new Promise((resolve, reject) => {
const https = require('https');
const fs = require('fs');
const file = fs.createWriteStream('/tmp/kubectl');
const request = https.get('https://dl.k8s.io/release/v1.32.1/bin/linux/amd64/kubectl', response => {
if (response.statusCode === 302 || response.statusCode === 301) {
https.get(response.headers.location, redirectResponse => {
redirectResponse.pipe(file);
file.on('finish', () => {
file.close();
resolve();
});
}).on('error', err => {
fs.unlink('/tmp/kubectl', () => {});
reject(err);
});
return;
}
response.pipe(file);
file.on('finish', () => {
file.close();
resolve();
});
});
request.on('error', err => {
fs.unlink('/tmp/kubectl', () => {});
reject(err);
});
});
execSync('chmod +x /tmp/kubectl', {
stdio: 'inherit'
});
} catch (error) {
console.error('Error installing kubectl:', error);
throw error;
}
console.log('Setting up helm...');
const helmUrl = 'https://get.helm.sh/helm-v3.12.0-linux-amd64.tar.gz';
const helmTarPath = `${tmpDir}/helm.tar.gz`;
await downloadFile(helmUrl, helmTarPath);
await extractTarGz(helmTarPath, tmpDir);
execSync(`chmod +x ${tmpDir}/helm`);
fs.unlinkSync(helmTarPath);
process.env.PATH = `${tmpDir}:${process.env.PATH}`;
execSync(`/tmp/aws-bin/aws --version`);
}
exports.handler = async (event, context) => {
try {
const { CLUSTER_NAME, NODE_GROUP_NAME, CLUSTER_ARN, CHART_VERSION,
PORTKEY_AWS_REGION, PORTKEY_AWS_ACCOUNT_ID, PORTKEYAM_ROLE_ARN,
PORTKEY_DOCKER_USERNAME, PORTKEY_DOCKER_PASSWORD,
PORTKEY_CLIENT_AUTH, ORGANISATIONS_TO_SYNC } = process.env;
console.log(process.env)
if (!CLUSTER_NAME || !PORTKEY_AWS_REGION || !CHART_VERSION ||
!PORTKEY_AWS_ACCOUNT_ID || !PORTKEYAM_ROLE_ARN) {
throw new Error('Missing one or more required environment variables.');
}
await setupBinaries();
const awsCredentialsDir = '/tmp/.aws';
if (!fs.existsSync(awsCredentialsDir)) {
fs.mkdirSync(awsCredentialsDir, { recursive: true });
}
// Write AWS credentials file
const credentialsContent = `[default]
aws_access_key_id = ${process.env.AWS_ACCESS_KEY_ID}
aws_secret_access_key = ${process.env.AWS_SECRET_ACCESS_KEY}
aws_session_token = ${process.env.AWS_SESSION_TOKEN}
region = ${process.env.PORTKEY_AWS_REGION}
`;
fs.writeFileSync(`${awsCredentialsDir}/credentials`, credentialsContent);
// Write AWS config file
const configContent = `[default]
region = ${process.env.PORTKEY_AWS_REGION}
output = json
`;
fs.writeFileSync(`${awsCredentialsDir}/config`, configContent);
// Set AWS config environment variables
process.env.AWS_CONFIG_FILE = `${awsCredentialsDir}/config`;
process.env.AWS_SHARED_CREDENTIALS_FILE = `${awsCredentialsDir}/credentials`;
// Define kubeconfig path
const kubeconfigDir = `/tmp/${CLUSTER_NAME.trim()}`;
const kubeconfigPath = path.join(kubeconfigDir, 'config');
// Create the directory if it doesn't exist
if (!fs.existsSync(kubeconfigDir)) {
fs.mkdirSync(kubeconfigDir, { recursive: true });
}
console.log(`Updating kubeconfig for cluster: ${CLUSTER_NAME}`);
execSync(`/tmp/aws-bin/aws eks update-kubeconfig --name ${process.env.CLUSTER_NAME} --region ${process.env.PORTKEY_AWS_REGION} --kubeconfig ${kubeconfigPath}`, {
stdio: 'inherit',
env: {
...process.env,
HOME: '/tmp',
AWS_CONFIG_FILE: `${awsCredentialsDir}/config`,
AWS_SHARED_CREDENTIALS_FILE: `${awsCredentialsDir}/credentials`
}
});
// Set KUBECONFIG environment variable
process.env.KUBECONFIG = kubeconfigPath;
let kubeconfig = fs.readFileSync(kubeconfigPath, 'utf8');
// Replace the command line to use full path
kubeconfig = kubeconfig.replace(
'command: aws',
'command: /tmp/aws-bin/aws'
);
fs.writeFileSync(kubeconfigPath, kubeconfig);
// Setup Helm repository
console.log('Setting up Helm repository...');
await new Promise((resolve, reject) => {
try {
execSync(`helm repo add portkey-ai https://portkey-ai.github.io/helm`, {
stdio: 'inherit',
env: { ...process.env, HOME: '/tmp' }
});
resolve();
} catch (error) {
reject(error);
}
});
await new Promise((resolve, reject) => {
try {
execSync(`helm repo update`, {
stdio: 'inherit',
env: { ...process.env, HOME: '/tmp' }
});
resolve();
} catch (error) {
reject(error);
}
});
// Create values.yaml
const valuesYAML = `
replicaCount: 1
images:
gatewayImage:
repository: "docker.io/portkeyai/gateway_enterprise"
pullPolicy: IfNotPresent
tag: "1.9.0"
dataserviceImage:
repository: "docker.io/portkeyai/data-service"
pullPolicy: IfNotPresent
tag: "1.0.2"
imagePullSecrets: [portkeyenterpriseregistrycredentials]
nameOverride: ""
fullnameOverride: ""
imageCredentials:
- name: portkeyenterpriseregistrycredentials
create: true
registry: https://index.docker.io/v1/
username: ${PORTKEY_DOCKER_USERNAME}
password: ${PORTKEY_DOCKER_PASSWORD}
useVaultInjection: false
environment:
create: true
secret: true
data:
SERVICE_NAME: portkeyenterprise
PORT: "8787"
LOG_STORE: s3_assume
LOG_STORE_REGION: ${PORTKEY_AWS_REGION}
AWS_ROLE_ARN: ${PORTKEYAM_ROLE_ARN}
LOG_STORE_GENERATIONS_BUCKET: portkey-gateway
ANALYTICS_STORE: control_plane
CACHE_STORE: redis
REDIS_URL: redis://redis:6379
REDIS_TLS_ENABLED: "false"
PORTKEY_CLIENT_AUTH: ${PORTKEY_CLIENT_AUTH}
ORGANISATIONS_TO_SYNC: ${ORGANISATIONS_TO_SYNC}
serviceAccount:
create: true
automount: true
annotations: {}
name: ""
podAnnotations: {}
podLabels: {}
podSecurityContext: {}
securityContext: {}
service:
type: LoadBalancer
port: 8787
targetPort: 8787
protocol: TCP
additionalLabels: {}
annotations: {}
ingress:
enabled: ${PORTKEY_GATEWAY_INGRESS_ENABLED}
className: ""
annotations: {}
hosts:
- host: ${PORTKEY_GATEWAY_INGRESS_SUBDOMAIN}
paths:
- path: /
pathType: ImplementationSpecific
tls: []
resources: {}
livenessProbe:
httpGet:
path: /v1/health
port: 8787
initialDelaySeconds: 30
periodSeconds: 60
timeoutSeconds: 5
failureThreshold: 5
readinessProbe:
httpGet:
path: /v1/health
port: 8787
initialDelaySeconds: 30
periodSeconds: 60
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 5
autoscaling:
enabled: true
minReplicas: 1
maxReplicas: 10
targetCPUUtilizationPercentage: 80
volumes: []
volumeMounts: []
nodeSelector: {}
tolerations: []
affinity: {}
autoRestart: false
dataservice:
name: "dataservice"
enabled: ${PORTKEY_FINE_TUNING_ENABLED}
containerPort: 8081
finetuneBucket: ${PORTKEY_AWS_ACCOUNT_ID}-${PORTKEY_AWS_REGION}-portkey-logs
logexportsBucket: ${PORTKEY_AWS_ACCOUNT_ID}-${PORTKEY_AWS_REGION}-portkey-logs
deployment:
autoRestart: true
replicas: 1
labels: {}
annotations: {}
podSecurityContext: {}
securityContext: {}
resources: {}
startupProbe:
httpGet:
path: /health
port: 8081
initialDelaySeconds: 60
failureThreshold: 3
periodSeconds: 10
timeoutSeconds: 1
livenessProbe:
httpGet:
path: /health
port: 8081
failureThreshold: 3
periodSeconds: 10
timeoutSeconds: 1
readinessProbe:
httpGet:
path: /health
port: 8081
failureThreshold: 3
periodSeconds: 10
timeoutSeconds: 1
extraContainerConfig: {}
nodeSelector: {}
tolerations: []
affinity: {}
volumes: []
volumeMounts: []
service:
type: ClusterIP
port: 8081
labels: {}
annotations: {}
loadBalancerSourceRanges: []
loadBalancerIP: ""
serviceAccount:
create: true
name: ""
labels: {}
annotations: {}
autoscaling:
enabled: false
createHpa: false
minReplicas: 1
maxReplicas: 5
targetCPUUtilizationPercentage: 80`
// Write values.yaml
const valuesYamlPath = '/tmp/values.yaml';
fs.writeFileSync(valuesYamlPath, valuesYAML);
const { S3Client, PutObjectCommand, GetObjectCommand } = require("@aws-sdk/client-s3");
const s3Client = new S3Client({ region: process.env.PORTKEY_AWS_REGION });
try {
const response = await s3Client.send(new GetObjectCommand({
Bucket: `${process.env.PORTKEY_AWS_ACCOUNT_ID}-${process.env.PORTKEY_AWS_REGION}-portkey-logs`,
Key: 'values.yaml'
}));
const existingValuesYAML = await response.Body.transformToString();
console.log('Found existing values.yaml in S3, using it instead of default');
fs.writeFileSync(valuesYamlPath, existingValuesYAML);
} catch (error) {
if (error.name === 'NoSuchKey') {
// Upload the default values.yaml to S3
await s3Client.send(new PutObjectCommand({
Bucket: `${process.env.PORTKEY_AWS_ACCOUNT_ID}-${process.env.PORTKEY_AWS_REGION}-portkey-logs`,
Key: 'values.yaml',
Body: valuesYAML,
ContentType: 'text/yaml'
}));
console.log('Default values.yaml written to S3 bucket');
} else {
throw error;
}
}
// Install/upgrade Helm chart
console.log('Installing helm chart...');
await new Promise((resolve, reject) => {
try {
execSync(`helm upgrade --install portkey-ai portkey-ai/gateway -f ${valuesYamlPath} -n portkeyai --create-namespace --kube-context ${process.env.CLUSTER_ARN} --kubeconfig ${kubeconfigPath}`, {
stdio: 'inherit',
env: {
...process.env,
HOME: '/tmp',
PATH: `/tmp/aws-bin:${process.env.PATH}`
}
});
resolve();
} catch (error) {
reject(error);
}
});
return {
statusCode: 200,
body: JSON.stringify({
message: 'EKS installation and helm chart deployment completed successfully',
event: event
})
};
} catch (error) {
console.error('Error:', error);
return {
statusCode: 500,
body: JSON.stringify({
message: 'Error during EKS installation and helm chart deployment',
error: error.message
})
};
}
};
```
### Post Deployment Verification
#### Verify Portkey AI Deployment
```bash theme={"system"}
kubectl get all -n portkeyai
```
#### Verify Portkey AI Gateway Endpoint
```bash theme={"system"}
export POD_NAME=$(kubectl get pods -n portkeyai -l app.kubernetes.io/name=gateway -o jsonpath="{.items[0].metadata.name}")
kubectl port-forward $POD_NAME 8787:8787 -n portkeyai
```
Visiting localhost:8787/v1/health will return `Server is healthy`
Your Portkey AI Gateway is now ready to use!
# Azure
Source: https://docs.portkey.ai/docs/self-hosting/hybrid-deployments/azure
This enterprise-focused document provides comprehensive instructions for deploying the Portkey software on Azure Kubernetes Service (AKS), tailored to meet the needs of large-scale, mission-critical applications. It includes specific recommendations for component sizing, high availability, and integration with monitoring systems.
## Best Practices
### 6. [Prompt Management](/product/prompt-library)
Use Portkey as a centralized hub to store, version, and experiment with your agent's prompts across multiple LLMs. Easily modify your prompts and run A/B tests without worrying about the breaking prod.
### 7. [Continuous Improvement](/product/observability/feedback)
Improve your Agent runs by capturing qualitative & quantitative user feedback on your requests, and then using that feedback to make your prompts AND LLMs themselves better.
### 8. [Security & Compliance](/product/enterprise-offering/security-portkey)
Set budget limits on provider API keys and implement fine-grained user roles and permissions for both the app and the Portkey APIs.
# Agno AI
Source: https://docs.portkey.ai/docs/integrations/agents/agno-ai
Use Portkey with Agno to build production-ready autonomous AI agents
## Introduction
Agno is a powerful framework for building autonomous AI agents that can reason, use tools, maintain memory, and access knowledge bases. Portkey enhances Agno agents with enterprise-grade capabilities for production deployments.
Portkey transforms your Agno agents into production-ready systems by providing:
* **Complete observability** of agent reasoning, tool usage, and knowledge retrieval
* **Access to 1600+ LLMs** through a unified interface
* **Built-in reliability** with fallbacks, retries, and load balancing
* **Cost tracking and optimization** across all agent operations
* **Advanced guardrails** for safe and compliant agent behavior
* **Enterprise governance** with budget controls and access management
Traces provide a hierarchical view of your agent's execution, showing the sequence of LLM calls, tool invocations, and state transitions.
```python theme={"system"}
from agno.agent import Agent
from agno.models.openai.like import OpenAILike
from portkey_ai import createHeaders
# Add tracing to your Agno agents
portkey_model = OpenAILike(
id="@opeani-provider-slug/gpt-4o",
api_key="YOUR_PORTKEY_API_KEY",
base_url="https://api.portkey.ai/v1",
default_headers=createHeaders(
trace_id="unique_execution_trace_id", # Add unique trace ID
)
)
agent = Agent(
model=portkey_model,
instructions="You are a helpful assistant.",
markdown=True
)
```
Portkey logs every interaction with LLMs, including:
* Complete request and response payloads
* Latency and token usage metrics
* Cost calculations
* Tool calls and function executions
All logs can be filtered by metadata, trace IDs, models, and more, making it easy to debug specific agent runs.
Portkey provides built-in dashboards that help you:
* Track cost and token usage across all agent runs
* Analyze performance metrics like latency and success rates
* Identify bottlenecks in your agent workflows
* Compare different agent configurations and LLMs
You can filter and segment all metrics by custom metadata to analyze specific agent types, user groups, or use cases.
Add custom metadata to your Agno agent calls to enable powerful filtering and segmentation:
```python theme={"system"}
from agno.agent import Agent
from agno.models.openai.like import OpenAILike
from portkey_ai import createHeaders
# Add metadata to your Agno agents
portkey_model = OpenAILike(
id="@opeani-provider-slug/gpt-4o",
api_key="YOUR_PORTKEY_API_KEY",
base_url="https://api.portkey.ai/v1",
default_headers=createHeaders(
metadata={"agent_type": "research_agent"}, # Add custom metadata
)
)
agent = Agent(
model=portkey_model,
name="Research Agent",
instructions="You are a research assistant.",
markdown=True
)
```
This metadata can be used to filter logs, traces, and metrics on the Portkey dashboard, allowing you to analyze specific agent runs, users, or environments.
This enables:
* Per-user cost tracking and budgeting
* Personalized user analytics
* Team or organization-level metrics
* Environment-specific monitoring (staging vs. production)
Traces provide a hierarchical view of your agent's execution, showing the sequence of LLM calls, tool invocations, and state transitions.
```python theme={"system"}
from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.ui import Console
from autogen_ext.models.openai import OpenAIChatCompletionClient
from autogen_core.models import ModelFamily
import asyncio
# Add tracing to your Autogen agents via Portkey headers
model_client = OpenAIChatCompletionClient(
base_url="https://api.portkey.ai/v1",
api_key="YOUR_PORTKEY_API_KEY",
model="@your-provider-slug/gpt-4o",
model_info={"family": ModelFamily.GPT_45},
default_headers={
"x-portkey-trace-id": "unique_execution_trace_id"
}
)
agent = AssistantAgent(
name="observer",
model_client=model_client,
system_message="You are a helpful assistant."
)
async def main():
await Console(agent.run_stream(task="Say hello"))
await model_client.close()
asyncio.run(main())
```
Portkey logs every interaction with LLMs, including:
* Complete request and response payloads
* Latency and token usage metrics
* Cost calculations
* Tool calls and function executions
All logs can be filtered by metadata, trace IDs, models, and more, making it easy to debug specific agent runs.
Portkey provides built-in dashboards that help you:
* Track cost and token usage across all agent runs
* Analyze performance metrics like latency and success rates
* Identify bottlenecks in your agent workflows
* Compare different agent configurations and LLMs
You can filter and segment all metrics by custom metadata to analyze specific agent types, user groups, or use cases.
Add custom metadata to your Autogen agent calls to enable powerful filtering and segmentation:
```python theme={"system"}
from autogen_agentchat.agents import AssistantAgent
from autogen_ext.models.openai import OpenAIChatCompletionClient
from autogen_core.models import ModelFamily
from portkey_ai import createHeaders
model_client = OpenAIChatCompletionClient(
base_url="https://api.portkey.ai/v1",
api_key="YOUR_PORTKEY_API_KEY",
model="@your-provider-slug/gpt-4o",
model_info={"family": ModelFamily.GPT_45},
default_headers=createHeaders(
metadata={"agent_type": "research_agent"}
)
)
agent = AssistantAgent(
name="research_agent",
model_client=model_client,
system_message="You are a research assistant."
)
```
This metadata can be used to filter logs, traces, and metrics on the Portkey dashboard, allowing you to analyze specific agent runs, users, or environments.
This enables:
* Per-user cost tracking and budgeting
* Personalized user analytics
* Team or organization-level metrics
* Environment-specific monitoring (staging vs. production)
### 5. [Continuous Improvement](/product/observability/feedback)
Improve your Agent runs by capturing qualitative & quantitative user feedback on your requests. Portkey's Feedback APIs provide a simple way to get weighted feedback from customers on any request you served, at any stage in your app. You can capture this feedback on a request or conversation level and analyze it by adding meta data to the relevant request.
### 6. [Caching](/product/ai-gateway/cache-simple-and-semantic)
Agent runs are time-consuming and expensive due to their complex pipelines. Caching can significantly reduce these costs by storing frequently used data and responses. Portkey offers a built-in caching system that stores past responses, reducing the need for agent calls saving both time and money.
```py theme={"system"}
{
"cache": {
"mode": "semantic" // Choose between "simple" or "semantic"
}
}
```
### 7.[ Security & Compliance](/product/enterprise-offering/security-portkey)
Set budget limits on provider API keys and implement fine-grained user roles and permissions for both the app and the Portkey APIs.
***
## [Portkey Config](/product/ai-gateway/configs)
Many of these features are driven by Portkey's Config architecture. The Portkey app simplifies creating, managing, and versioning your Configs.
For more information on using these features and setting up your Config, please refer to the [Portkey documentation](https://docs.portkey.ai).
# Control Flow
Source: https://docs.portkey.ai/docs/integrations/agents/control-flow
Use Portkey with Control Flow to take your AI Agents to production
## Getting Started
### 1. Install the required packages:
```sh theme={"system"}
pip install -qU portkey-ai controlflow
```
### **2.** Configure your Control FLow LLM objects:
```py theme={"system"}
import controlflow as cf
from langchain_openai import ChatOpenAI
from portkey_ai import createHeaders, PORTKEY_GATEWAY_URL
llm = ChatOpenAI(
api_key="OpenAI_API_Key",
base_url=PORTKEY_GATEWAY_URL,
default_headers=createHeaders(
provider="openai", #choose your provider
api_key="PORTKEY_API_KEY"
)
)
```
## Integration Guide
Here's a simple Google Colab notebook that demonstrates Control Flow with Portkey integration
[
### 5. [Continuous Improvement](/product/observability/feedback)
Improve your Agent runs by capturing qualitative & quantitative user feedback on your requests. Portkey's Feedback APIs provide a simple way to get weighted feedback from customers on any request you served, at any stage in your app. You can capture this feedback on a request or conversation level and analyze it by adding meta data to the relevant request.
### 6. [Caching](/product/ai-gateway/cache-simple-and-semantic)
Agent runs are time-consuming and expensive due to their complex pipelines. Caching can significantly reduce these costs by storing frequently used data and responses. Portkey offers a built-in caching system that stores past responses, reducing the need for agent calls saving both time and money.
```py theme={"system"}
{
"cache": {
"mode": "semantic" // Choose between "simple" or "semantic"
}
}
```
### 7. [Security & Compliance](/product/enterprise-offering/security-portkey)
Set budget limits on provider API keys and implement fine-grained user roles and permissions for both the app and the Portkey APIs.
***
## Portkey Config
Many of these features are driven by Portkey's Config architecture. The Portkey app simplifies creating, managing, and versioning your Configs.
For more information on using these features and setting up your Config, please refer to the [Portkey documentation](https://docs.portkey.ai).
# CrewAI
Source: https://docs.portkey.ai/docs/integrations/agents/crewai
Use Portkey with CrewAI to take your AI Agents to production
## Introduction
CrewAI is a framework for orchestrating role-playing, autonomous AI agents designed to solve complex, open-ended tasks through collaboration. It provides a robust structure for agents to work together, leverage tools, and exchange insights to accomplish sophisticated objectives.
Portkey enhances CrewAI with production-readiness features, turning your experimental agent crews into robust systems by providing:
* **Complete observability** of every agent step, tool use, and interaction
* **Built-in reliability** with fallbacks, retries, and load balancing
* **Cost tracking and optimization** to manage your AI spend
* **Access to 1600+ LLMs** through a single integration
* **Guardrails** to keep agent behavior safe and compliant
* **Version-controlled prompts** for consistent agent performance
Traces provide a hierarchical view of your crew's execution, showing the sequence of LLM calls, tool invocations, and state transitions.
```python theme={"system"}
# Add trace_id to enable hierarchical tracing in Portkey
portkey_llm = LLM(
model="gpt-4o",
base_url=PORTKEY_GATEWAY_URL,
api_key="dummy",
extra_headers=createHeaders(
api_key="YOUR_PORTKEY_API_KEY",
provider="@YOUR_OPENAI_PROVIDER",
trace_id="unique-session-id" # Add unique trace ID
)
)
```
Portkey logs every interaction with LLMs, including:
* Complete request and response payloads
* Latency and token usage metrics
* Cost calculations
* Tool calls and function executions
All logs can be filtered by metadata, trace IDs, models, and more, making it easy to debug specific crew runs.
Portkey provides built-in dashboards that help you:
* Track cost and token usage across all crew runs
* Analyze performance metrics like latency and success rates
* Identify bottlenecks in your agent workflows
* Compare different crew configurations and LLMs
You can filter and segment all metrics by custom metadata to analyze specific crew types, user groups, or use cases.
Add custom metadata to your CrewAI LLM configuration to enable powerful filtering and segmentation:
```python theme={"system"}
portkey_llm = LLM(
model="gpt-4o",
base_url=PORTKEY_GATEWAY_URL,
api_key="dummy",
extra_headers=createHeaders(
api_key="YOUR_PORTKEY_API_KEY",
provider="@YOUR_OPENAI_PROVIDER",
metadata={
"crew_type": "research_crew",
"environment": "production",
"_user": "user_123", # Special _user field for user analytics
"request_source": "mobile_app"
}
)
)
```
This metadata can be used to filter logs, traces, and metrics on the Portkey dashboard, allowing you to analyze specific crew runs, users, or environments.
This enables:
* Per-user cost tracking and budgeting
* Personalized user analytics
* Team or organization-level metrics
* Environment-specific monitoring (staging vs. production)
Official CrewAI documentation
Get personalized guidance on implementing this integration
### 5. [Traces](/product/observability/traces)
With traces, you can see each agent run granularly on Portkey. Tracing your Langchain agent runs helps in debugging, performance optimzation, and visualizing how exactly your agents are running.
### Using Traces in Langchain Agents
#### Step 1: Import & Initialize the Portkey Langchain Callback Handler
```py theme={"system"}
from portkey_ai.langchain import LangchainCallbackHandler
portkey_handler = LangchainCallbackHandler(
api_key="YOUR_PORTKEY_API_KEY",
metadata={
"session_id": "session_1", # Use consistent metadata across your application
"agent_id": "research_agent_1", # Specific to the current agent
}
)
```
#### Step 2: Configure Your LLM with the Portkey Callback
```py theme={"system"}
from langchain.chat_models import ChatOpenAI
llm = ChatOpenAI(
api_key="YOUR_OPENAI_API_KEY_HERE",
callbacks=[portkey_handler],
# ... other parameters
)
```
With Portkey tracing, you can encapsulate the complete execution of your agent workflow.
### 6. Guardrails
LLMs are brittle - not just in API uptimes or their inexplicable `400`/`500` errors, but also in their core behavior. You can get a response with a `200` status code that completely errors out for your app's pipeline due to mismatched output. With Portkey's Guardrails, we now help you enforce LLM behavior in real-time with our *Guardrails on the Gateway* pattern.
Using Portkey's Guardrail platform, you can now verify your LLM inputs AND outputs to be adhering to your specifed checks; and since Guardrails are built on top of our [Gateway](https://github.com/portkey-ai/gateway), you can orchestrate your request exactly the way you want - with actions ranging from *denying the request*, *logging the guardrail result*, *creating an evals dataset*, *falling back to another LLM or prompt*, *retrying the request*, and more.
### 7. [Continuous Improvement](/product/observability/feedback)
Improve your Agent runs by capturing qualitative & quantitative user feedback on your requests. Portkey's Feedback APIs provide a simple way to get weighted feedback from customers on any request you served, at any stage in your app. You can capture this feedback on a request or conversation level and analyze it by adding meta data to the relevant request.
### 8. [Caching](/product/ai-gateway/cache-simple-and-semantic)
Agent runs are time-consuming and expensive due to their complex pipelines. Caching can significantly reduce these costs by storing frequently used data and responses. Portkey offers a built-in caching system that stores past responses, reducing the need for agent calls saving both time and money.
```json theme={"system"}
{
"cache": {
"mode": "semantic" // Choose between "simple" or "semantic"
}
}
```
### 9. [Security & Compliance](/product/enterprise-offering/security-portkey)
Set budget limits on provider API keys and implement fine-grained user roles and permissions for both the app and the Portkey APIs.
***
## [Portkey Config](/product/ai-gateway/configs)
Many of these features are driven by Portkey's Config architecture. The Portkey app simplifies creating, managing, and versioning your Configs.
For more information on using these features and setting up your Config, please refer to the [Portkey documentation](https://docs.portkey.ai).
# LangGraph
Source: https://docs.portkey.ai/docs/integrations/agents/langgraph
Use Portkey with LangGraph to take your AI agent workflows to production
## Introduction
LangGraph is a library for building stateful, multi-actor applications with LLMs, designed to make developing complex agent workflows easier. It provides a flexible framework to create directed graphs where nodes process information and edges define the flow between them.
Portkey enhances LangGraph with production-readiness features, turning your experimental agent workflows into robust systems by providing:
* **Complete observability** of every agent step, tool use, and state transition
* **Built-in reliability** with fallbacks, retries, and load balancing
* **Cost tracking and optimization** to manage your AI spend
* **Access to 1600+ LLMs** through a single integration
* **Guardrails** to keep agent behavior safe and compliant
* **Version-controlled prompts** for consistent agent performance
Traces provide a hierarchical view of your agent's execution, showing the sequence of LLM calls, tool invocations, and state transitions.
```python theme={"system"}
# Add trace_id to enable hierarchical tracing in Portkey
llm = ChatOpenAI(
api_key="dummy",
base_url="https://api.portkey.ai/v1",
default_headers=createHeaders(
api_key="YOUR_PORTKEY_API_KEY",
provider="@YOUR_LLM_PROVIDER",
trace_id="unique-session-id", # Add unique trace ID
metadata={"request_type": "user_query"}
)
)
```
LangGraph also offers its own tracing via LangSmith, which can be used alongside Portkey for even more detailed workflow insights.
Portkey logs every interaction with LLMs, including:
* Complete request and response payloads
* Latency and token usage metrics
* Cost calculations
* Tool calls and function executions
All logs can be filtered by metadata, trace IDs, models, and more, making it easy to debug specific agent runs.
Portkey provides built-in dashboards that help you:
* Track cost and token usage across all agent runs
* Analyze performance metrics like latency and success rates
* Identify bottlenecks in your agent workflows
* Compare different agent configurations and LLMs
You can filter and segment all metrics by custom metadata to analyze specific agent types, user groups, or use cases.
Add custom metadata to your LangGraph agent calls to enable powerful filtering and segmentation:
```python theme={"system"}
llm = ChatOpenAI(
api_key="dummy",
base_url="https://api.portkey.ai/v1",
default_headers=createHeaders(
api_key="YOUR_PORTKEY_API_KEY",
provider="@YOUR_LLM_PROVIDER",
metadata={
"agent_type": "search_agent",
"environment": "production",
"_user": "user_123", # Special _user field for user analytics
"graph_id": "complex_workflow"
}
)
)
```
This metadata can be used to filter logs, traces, and metrics on the Portkey dashboard, allowing you to analyze specific agent runs, users, or environments.
This enables:
* Per-user cost tracking and budgeting
* Personalized user analytics
* Team or organization-level metrics
* Environment-specific monitoring (staging vs. production)
Official LangGraph documentation
Official Portkey documentation
Get personalized guidance on implementing this integration
### 2. Advanced Logs
Portkey's logging dashboard provides detailed logs for every request made to your LLMs. These logs include:
* Complete request and response tracking
* Metadata tags for filtering
* Cost attribution and much more...
### 3. Unified Access to 1600+ LLMs
You can easily switch between 1600+ LLMs. Call various LLMs such as Anthropic, Gemini, Mistral, Azure OpenAI, Google Vertex AI, AWS Bedrock, and many more by simply changing the `provider` in your default `config` object.
### 4. Advanced Metadata Tracking
Using Portkey, you can add custom metadata to your LLM requests for detailed tracking and analytics. Use metadata tags to filter logs, track usage, and attribute costs across departments and teams.
### 5. [Traces](/product/observability/traces)
With traces, you can see each agent run granularly on Portkey. Tracing your LlamaIndex agent runs helps in debugging, performance optimzation, and visualizing how exactly your agents are running.
### Using Traces in LlamaIndex Agents
#### Step 1: Import & Initialize the Portkey LlamaIndex Callback Handler
```py theme={"system"}
from portkey_ai.llamaindex import LlamaIndexCallbackHandler
portkey_handler = LlamaIndexCallbackHandler(
api_key="YOUR_PORTKEY_API_KEY",
metadata={
"session_id": "session_1", # Use consistent metadata across your application
"agent_id": "research_agent_1", # Specific to the current agent
}
)
```
#### Step 2: Configure Your LLM with the Portkey Callback
```py theme={"system"}
from llama_index.llms.openai import OpenAI
llm = OpenAI(
api_key="YOUR_OPENAI_API_KEY_HERE",
callbacks=[portkey_handler], # Replace with your OpenAI API key
# ... other parameters
)
```
With Portkey tracing, you can encapsulate the complete execution of your agent workflow.
### 6. [Continuous Improvement](/product/observability/feedback)
Improve your Agent runs by capturing qualitative & quantitative user feedback on your requests. Portkey's Feedback APIs provide a simple way to get weighted feedback from customers on any request you served, at any stage in your app. You can capture this feedback on a request or conversation level and analyze it by adding meta data to the relevant request.
### 7. [Caching](/product/ai-gateway/cache-simple-and-semantic)
Agent runs are time-consuming and expensive due to their complex pipelines. Caching can significantly reduce these costs by storing frequently used data and responses. Portkey offers a built-in caching system that stores past responses, reducing the need for agent calls saving both time and money.
```py theme={"system"}
{
"cache": {
"mode": "semantic" // Choose between "simple" or "semantic"
}
}
```
### 8. [Security & Compliance](/product/enterprise-offering/security-portkey)
Set budget limits on provider API keys and implement fine-grained user roles and permissions for both the app and the Portkey APIs.
***
## [Portkey Config](/product/ai-gateway/configs)
Many of these features are driven by Portkey's Config architecture. The Portkey app simplifies creating, managing, and versioning your Configs.
For more information on using these features and setting up your Config, please refer to the [Portkey documentation](https://docs.portkey.ai).
# OpenAI Agents SDK (Python)
Source: https://docs.portkey.ai/docs/integrations/agents/openai-agents
Use Portkey with OpenAI Agents SDK to take your AI Agents to production
## Introduction
OpenAI Agents SDK enables the development of complex AI agents with tools, planning, and memory capabilities. Portkey enhances OpenAI Agents with observability, reliability, and production-readiness features.
Portkey turns your experimental OpenAI Agents into production-ready systems by providing:
* **Complete observability** of every agent step, tool use, and interaction
* **Built-in reliability** with fallbacks, retries, and load balancing
* **Cost tracking and optimization** to manage your AI spend
* **Access to 1600+ LLMs** through a single integration
* **Guardrails** to keep agent behavior safe and compliant
* **Version-controlled prompts** for consistent agent performance
Traces provide a hierarchical view of your agent's execution, showing the sequence of LLM calls, tool invocations, and state transitions.
```python theme={"system"}
# Add tracing to your OpenAI Agents
portkey = AsyncOpenAI(
base_url=PORTKEY_GATEWAY_URL,
api_key=os.environ["PORTKEY_API_KEY"],
default_headers=createHeaders(
trace_id="unique_execution_trace_id", # Add unique trace ID
provider="@YOUR_OPENAI_PROVIDER"
)
)
set_default_openai_client(portkey)
```
Portkey logs every interaction with LLMs, including:
* Complete request and response payloads
* Latency and token usage metrics
* Cost calculations
* Tool calls and function executions
All logs can be filtered by metadata, trace IDs, models, and more, making it easy to debug specific agent runs.
Portkey provides built-in dashboards that help you:
* Track cost and token usage across all agent runs
* Analyze performance metrics like latency and success rates
* Identify bottlenecks in your agent workflows
* Compare different agent configurations and LLMs
You can filter and segment all metrics by custom metadata to analyze specific agent types, user groups, or use cases.
Add custom metadata to your OpenAI agent calls to enable powerful filtering and segmentation:
```python theme={"system"}
# Add tracing to your OpenAI Agents
portkey = AsyncOpenAI(
base_url=PORTKEY_GATEWAY_URL,
api_key=os.environ["PORTKEY_API_KEY"],
default_headers=createHeaders(
metadata={"agent_type": "research_agent"}, # Add custom metadata
provider="@YOUR_OPENAI_PROVIDER"
)
)
set_default_openai_client(portkey)
```
This metadata can be used to filter logs, traces, and metrics on the Portkey dashboard, allowing you to analyze specific agent runs, users, or environments.
This enables:
* Per-user cost tracking and budgeting
* Personalized user analytics
* Team or organization-level metrics
* Environment-specific monitoring (staging vs. production)
Official OpenAI Agents SDK documentation
Example implementations for various use cases
Get personalized guidance on implementing this integration
Traces provide a hierarchical view of your agent's execution, showing the sequence of LLM calls, tool invocations, and state transitions.
```typescript theme={"system"}
// Add tracing to your OpenAI Agents
const portkey = new OpenAI({
baseURL: PORTKEY_GATEWAY_URL,
apiKey: process.env.PORTKEY_API_KEY!,
defaultHeaders: createHeaders({
traceId: "unique_execution_trace_id", // Add unique trace ID
provider:"@YOUR_OPENAI_PROVIDER"
})
});
setDefaultOpenAIClient(portkey);
```
Portkey logs every interaction with LLMs, including:
* Complete request and response payloads
* Latency and token usage metrics
* Cost calculations
* Tool calls and function executions
All logs can be filtered by metadata, trace IDs, models, and more, making it easy to debug specific agent runs.
Portkey provides built-in dashboards that help you:
* Track cost and token usage across all agent runs
* Analyze performance metrics like latency and success rates
* Identify bottlenecks in your agent workflows
* Compare different agent configurations and LLMs
You can filter and segment all metrics by custom metadata to analyze specific agent types, user groups, or use cases.
Add custom metadata to your OpenAI agent calls to enable powerful filtering and segmentation:
```typescript theme={"system"}
// Add metadata to your OpenAI Agents
const portkey = new OpenAI({
baseURL: PORTKEY_GATEWAY_URL,
apiKey: process.env.PORTKEY_API_KEY!,
defaultHeaders: createHeaders({
metadata: {"agent_type": "research_agent"}, // Add custom metadata
provider:"@YOUR_OPENAI_PROVIDER"
})
});
setDefaultOpenAIClient(portkey);
```
This metadata can be used to filter logs, traces, and metrics on the Portkey dashboard, allowing you to analyze specific agent runs, users, or environments.
This enables:
* Per-user cost tracking and budgeting
* Personalized user analytics
* Team or organization-level metrics
* Environment-specific monitoring (staging vs. production)
Official OpenAI Agents SDK documentation
Example implementations for various use cases
Get personalized guidance on implementing this integration
Call various LLMs like Anthropic, Gemini, Mistral, Azure OpenAI, Google Vertex AI, and AWS Bedrock with minimal code changes.
Speed up agent responses and save costs by storing past responses in the Portkey cache. Choose between Simple and Semantic cache modes.
Set up fallbacks between different LLMs, load balance requests across multiple instances, set automatic retries, and request timeouts.
Get comprehensive logs of agent interactions, including cost, tokens used, response time, and function calls. Send custom metadata for better analytics.
Access detailed logs of agent executions, function calls, and interactions. Debug and optimize your agents effectively.
Implement budget limits, role-based access control, and audit trails for your agent operations.
Capture and analyze user feedback to improve agent performance over time.
#### Send Custom Metadata with your requests
Add trace IDs to track specific workflows:
```python theme={"system"}
portkey = Portkey(
api_key="YOUR_PORTKEY_API_KEY",
provider="@YOUR_PROVIDER",
trace_id="weather_workflow_123",
metadata={
"agent": "weather_agent",
"environment": "production"
}
)
```
## 5. [Logs and Traces](/product/observability/logs)
Logs are essential for understanding agent behavior, diagnosing issues, and improving performance. They provide a detailed record of agent activities and tool use, which is crucial for debugging and optimizing processes.
Access a dedicated section to view records of agent executions, including parameters, outcomes, function calls, and errors. Filter logs based on multiple parameters such as trace ID, model, tokens used, and metadata.
## 6. [Security & Compliance - Enterprise-Ready Controls](/product/enterprise-offering/security-portkey)
When deploying agents in production, security is crucial. Portkey provides enterprise-grade security features:
# Phidata
Source: https://docs.portkey.ai/docs/integrations/agents/phidata
Use Portkey with Phidata to take your AI Agents to production
## Getting started
### 1. Install the required packages:
```sh theme={"system"}
pip install phidata portkey-ai
```
### **2.** Configure your Phidata LLM objects:
```py theme={"system"}
from phi.llm.openai import OpenAIChat
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders
llm = OpenAIChat(
base_url=PORTKEY_GATEWAY_URL,
api_key="OPENAI_API_KEY", #Replace with Your OpenAI Key
default_headers=createHeaders(
provider="openai",
api_key=PORTKEY_API_KEY # Replace with your Portkey API key
)
)
```
## Integration Guide
Here's a simple Colab notebook that demonstrates Phidata with Portkey integration
[
### 5. [Continuous Improvement](/product/observability/feedback)
Improve your Agent runs by capturing qualitative & quantitative user feedback on your requests. Portkey's Feedback APIs provide a simple way to get weighted feedback from customers on any request you served, at any stage in your app. You can capture this feedback on a request or conversation level and analyze it by adding meta data to the relevant request.
### 6. [Caching](/product/ai-gateway/cache-simple-and-semantic)
Agent runs are time-consuming and expensive due to their complex pipelines. Caching can significantly reduce these costs by storing frequently used data and responses. Portkey offers a built-in caching system that stores past responses, reducing the need for agent calls saving both time and money.
```py theme={"system"}
{
"cache": {
"mode": "semantic" // Choose between "simple" or "semantic"
}
}
```
### 7. [Security & Compliance](/product/enterprise-offering/security-portkey)
Set budget limits on provider API keys and implement fine-grained user roles and permissions for both the app and the Portkey APIs.
## [Portkey Config](/product/ai-gateway/configs)
Many of these features are driven by Portkey's Config architecture. The Portkey app simplifies creating, managing, and versioning your Configs.
For more information on using these features and setting up your Config, please refer to the [Portkey documentation](https://docs.portkey.ai).
# Pydantic AI
Source: https://docs.portkey.ai/docs/integrations/agents/pydantic-ai
Use Portkey with PydanticAI to take your AI Agents to production
## Introduction
PydanticAI is a Python agent framework designed to make it less painful to build production-grade applications with Generative AI. It brings the same ergonomic design and developer experience to GenAI that FastAPI brought to web development.
Portkey enhances PydanticAI with production-readiness features, turning your experimental agents into robust systems by providing:
* **Complete observability** of every agent step, tool use, and interaction
* **Built-in reliability** with fallbacks, retries, and load balancing
* **Cost tracking and optimization** to manage your AI spend
* **Access to 1600+ LLMs** through a single integration
* **Guardrails** to keep agent behavior safe and compliant
* **OpenTelemetry integration** for comprehensive monitoring
Traces provide a hierarchical view of your agent's execution, showing the sequence of LLM calls, tool invocations, and state transitions.
```python theme={"system"}
# Add trace_id to enable hierarchical tracing in Portkey
portkey_client = AsyncOpenAI(
api_key="YOUR_PORTKEY_API_KEY",
base_url="https://api.portkey.ai/v1",
default_headers={"x-portkey-trace-id": "unique-session-id"}
)
```
Portkey logs every interaction with LLMs, including:
* Complete request and response payloads
* Latency and token usage metrics
* Cost calculations
* Tool calls and function executions
All logs can be filtered by metadata, trace IDs, models, and more, making it easy to debug specific agent runs.
Portkey provides built-in dashboards that help you:
* Track cost and token usage across all agent runs
* Analyze performance metrics like latency and success rates
* Identify bottlenecks in your agent workflows
* Compare different agent configurations and LLMs
You can filter and segment all metrics by custom metadata to analyze specific agent types, user groups, or use cases.
Add custom metadata to your PydanticAI agent calls to enable powerful filtering and segmentation:
```python theme={"system"}
portkey_client = AsyncOpenAI(
api_key="YOUR_PORTKEY_API_KEY",
base_url="https://api.portkey.ai/v1",
default_headers={
"x-portkey-metadata": '{"agent_type": "weather_agent", "environment": "production", "_user": "user_123", "request_source": "mobile_app"}'
}
)
```
This metadata can be used to filter logs, traces, and metrics on the Portkey dashboard, allowing you to analyze specific agent runs, users, or environments.
This enables:
* Per-user cost tracking and budgeting
* Personalized user analytics
* Team or organization-level metrics
* Environment-specific monitoring (staging vs. production)
Official PydanticAI documentation
Official Portkey documentation
Get personalized guidance on implementing this integration
```python {11} theme={"system"}
from strands import Agent
from strands.models.openai import OpenAIModel
from strands_tools import calculator
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders
model = OpenAIModel(
client_args={
"api_key": "YOUR_PORTKEY_API_KEY",
"base_url": PORTKEY_GATEWAY_URL,
# Add trace ID to group related requests
"default_headers": createHeaders(trace_id="user_session_123")
},
model_id="gpt-4o",
params={"temperature": 0.7}
)
agent = Agent(model=model, tools=[calculator])
response = agent("What's 15% of 2,847?")
```
All requests from this agent will be grouped under the same trace, making it easy to analyze the complete interaction flow.
Get personalized guidance on implementing this integration
### 2. Add Aporia's Guardrail Check
* Now, navigate to the `Guardrails` page
* Search for `Validate - Project` Guardrail Check and click on `Add`
* Input your corresponding Aporia Project ID where you are defining the policies
* Save the check, set any actions you want on the check, and create the Guardrail!
| Check Name | Description | Parameters | Supported Hooks |
| :------------------ | :---------------------------------------------------------------------------------- | :----------------- | :----------------------------------- |
| Validate - Projects | Runs a project containing policies set in Aporia and returns a PASS or FAIL verdict | Project ID: string | beforeRequestHooks afterRequestHooks |
Your Aporia Guardrail is now ready to be added to any Portkey request you'd like!
### 3. Add Guardrail ID to a Config and Make Your Request
* When you save a Guardrail, you'll get an associated Guardrail ID - add this ID to the `before_request_hooks` or `after_request_hooks` params in your Portkey Config
* Save this Config and pass it along with any Portkey request you're making!
Your requests are now guarded by your Aporia policies and you can see the Verdict and any action you take directly on Portkey logs! More detailed logs for your requests will also be available on your Aporia dashboard.
***
## Get Support
If you face any issues with the Aporia integration, just ping the @Aporia team on the [community forum](https://discord.gg/portkey-llms-in-prod-1143393887742861333).
# Azure Guardrails
Source: https://docs.portkey.ai/docs/integrations/guardrails/azure-guardrails
Integrate Microsoft Azure's powerful content moderation services & PII guardrails with Portkey
Microsoft Azure offers robust content moderation and PII readaction services that can now be seamlessly integrated with Portkey's guardrails ecosystem. This integration supports two powerful Azure services:
## Using Azure Guardrails - Scenarios
After setting up your guardrails, there are different ways to use them depending on your security requirements:
### Detect and Monitor Only
To simply detect but not block content:
* Configure your guardrail actions without enabling "Deny"
* Monitor the guardrail results in your Portkey logs
* If any issues are detected, the response will include a `hook_results` object with details
### Redact PII Automatically
To automatically remove sensitive information:
* Enable the `Redact` option for Azure PII Detection
* When PII is detected, it will be automatically redacted and replaced with standardized identifiers
* The response will include a `transformed` flag set to `true` in the results
### Block Harmful Content
To completely block requests that violate your policies:
* Enable the `Deny` option in the guardrails action tab
* If harmful content is detected, the request will fail with an appropriate status code
* You can customize denial messages to provide guidance to users
***
## Need Support?
If you encounter any issues with Azure Guardrails, please reach out to our support team through the [Portkey community forum](https://discord.gg/portkey-llms-in-prod-1143393887742861333).
# AWS Bedrock Guardrails
Source: https://docs.portkey.ai/docs/integrations/guardrails/bedrock-guardrails
Secure your AI applications with AWS Bedrock's guardrail capabilities through Portkey.
[AWS Bedrock Guardrails](https://aws.amazon.com/bedrock/) provides a comprehensive solution for securing your LLM applications, including content filtering, PII detection and redaction, and more.
To get started with AWS Bedrock Guardrails, visit their documentation:
In the Guardrail configuration UI, you'll need to provide:
| Field | Description | Type |
| :-------------- | :--------------------------------------- | :------------ |
| **Webhook URL** | Your webhook's endpoint URL | `string` |
| **Headers** | Headers to include with webhook requests | `JSON` |
| **Timeout** | Maximum wait time for webhook response | `number` (ms) |
#### Webhook URL
This should be a publicly accessible URL where your webhook is hosted.
### 2. Add Patronus' Guardrail Checks & Actions
Navigate to the `Guardrails` page and you will see the Guardrail Checks offered by Patronus there. Add the ones you want, set actions, and create the Guardrail!
#### List of Patronus Guardrail Checks
| Check Name | Description | Parameters | Supported Hooks |
| :------------------------- | :------------------------------------------------------------------------------------------------------------------------------------------------ | :----------------------------------- | :---------------- |
| Retrieval Answer Relevance | Checks whether the answer is on-topic to the input question. Does not measure correctness. | **ON** or **OFF** | afterRequestHooks |
| Custom Evaluator | Checks against custom criteria, based on Patronus evaluator profile name. | **string**(evaluator's profile name) | afterRequestHooks |
| Is Concise | Check that the output is clear and concise. | **ON** or **OFF** | afterRequestHooks |
| Is Helpful | Check that the output is helpful in its tone of voice. | **ON** or **OFF** | afterRequestHooks |
| Is Polite | Check that the output is polite in conversation. | **ON** or **OFF** | afterRequestHooks |
| No Apologies | Check that the output does not contain apologies. | **ON** or **OFF** | afterRequestHooks |
| No Gender Bias | Check whether the output contains gender stereotypes. Useful to mitigate PR risk from sexist or gendered model outputs. | **ON** or **OFF** | afterRequestHooks |
| No Racias Bias | Check whether the output contains any racial stereotypes or not. | **ON** or **OFF** | afterRequestHooks |
| Detect Toxicity | Checks output for abusive and hateful messages. | **ON** or **OFF** | afterRequestHooks |
| Detect PII | Checks for personally identifiable information (PII) - this is information that, in conjunction with other data, can identify an individual. | **ON** or **OFF** | afterRequestHooks |
| Detect PHI | Checks for protected health information (PHI), defined broadly as any information about an individual's health status or provision of healthcare. | **ON** or **OFF** | afterRequestHooks |
Your Patronus Guardrail is now ready to be added to any Portkey request you'd like!
### 3. Add Guardrail ID to a Config and Make Your Request
Your Pillar Guardrail is now ready to be added to any Portkey request you'd like!
### 3. Add Guardrail ID to a Config and Make Your Request
* When you save a Guardrail, you'll get an associated Guardrail ID - add this ID to the `before_request_hooks` or `after_request_hooks` params in your Portkey Config
* Save this Config and pass it along with any Portkey request you're making!
Your requests are now guarded by your Pillar checks and you can see the Verdict and any action you take directly on Portkey logs! More detailed logs for your requests will also be available on your Pillar dashboard.
***
## Get Support
If you face any issues with the Pillar integration, just ping the @Pillar team on the [community forum](https://discord.gg/portkey-llms-in-prod-1143393887742861333).
# Prompt Security
Source: https://docs.portkey.ai/docs/integrations/guardrails/prompt-security
Prompt Security detects and protects against prompt injection, sensitive data exposure, and other AI security threats.
[Prompt Security](https://www.prompt.security/solutions/employees) provides advanced protection for your AI applications against various security threats including prompt injections and sensitive data exposure, helping ensure safe interactions with LLMs.
To get started with Prompt Security, visit their website:
1. **Navigate to Guardrails**: Go to the `Guardrails` page and click `Create`
2. **Select Regex Match**: Choose the "Regex Replace" guardrail from the BASIC category
3. **Configure the Pattern**:
* **Regex Rule**: Enter your regex pattern to match specific data (e.g., `\b\d{3}-\d{2}-\d{4}\b` for SSN patterns)
* **Replacement Text**: Define what to replace matches with (e.g., `[REDACTED]`, `*****`, `[SSN_HIDDEN]`)
4. **Save the Guardrail**: Name your guardrail and save it to get the associated Guardrail ID
### Common Regex Patterns for Sensitive Data
| Pattern Type | Regex Pattern | Replacement Example |
| ---------------------- | ---------------------------------------------------- | ------------------- |
| Credit Card | `\b\d{4}[- ]?\d{4}[- ]?\d{4}[- ]?\d{4}\b` | `[CREDIT_CARD]` |
| Social Security Number | `\b\d{3}-\d{2}-\d{4}\b` | `[SSN_REDACTED]` |
| Phone Numbers | `\b\d{3}[-.]\d{3}[-.]\d{4}\b` | `[PHONE_HIDDEN]` |
| Email Addresses | `\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b` | `[EMAIL_REDACTED]` |
| Custom Employee IDs | `EMP-\d{6}` | `[EMPLOYEE_ID]` |
### Adding to Your Config
Once you've created your custom regex pattern guardrail, add it to your Portkey config:
```json theme={"system"}
{
"before_request_hooks": [
{"id": "your-guardrail-id"}
],
"after_request_hooks": [
{"id": "your-guardrail-id"}
]
}
```
### 2. Advanced Logs
Portkey's logging dashboard provides detailed logs for every request made by Anthropic Computer Use. These logs include:
* Complete request and response tracking
* Code context and generation metrics
* Developer attribution
* Cost breakdown per coding session
### 3. Unified Access to 250+ LLMs
Easily switch between 250+ LLMs for different coding tasks. Use GPT-4 for complex architecture decisions, Claude for detailed code reviews, or specialized models for specific languages - all through a single interface.
### 4. Advanced Metadata Tracking
Track coding patterns and productivity metrics with custom metadata:
* Language and framework usage
* Code generation vs completion tasks
* Time-of-day productivity patterns
* Project-specific metrics
### 2. Advanced Logs
Portkey's logging dashboard provides detailed logs for every request made to your LLMs. These logs include:
* Complete request and response tracking
* Metadata tags for filtering
* Cost attribution and much more...
### 3. Unified Access to 1600+ LLMs
You can easily switch between 1600+ LLMs. Call various LLMs such as Anthropic, Gemini, Mistral, Azure OpenAI, Google Vertex AI, AWS Bedrock, and many more by simply changing the `virtual key` in your default `config` object.
### 4. Advanced Metadata Tracking
Using Portkey, you can add custom metadata to your LLM requests for detailed tracking and analytics. Use metadata tags to filter logs, track usage, and attribute costs across departments and teams.
## Quick Start Integration
Autogen supports a concept of `config\_list` which allows definitions of the LLM provider and model to be used. Portkey seamlessly integrates into the Autogen framework through a custom config we create.
### Example using minimal configuration
```py theme={"system"}
from autogen import AssistantAgent, UserProxyAgent, config_list_from_json
# Import the portkey library to fetch helper functions
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders
config_list = [
{
"api_key": 'Your OpenAI Key',
"model": "gpt-3.5-turbo",
"base_url": PORTKEY_GATEWAY_URL,
"api_type": "openai",
"default_headers": createHeaders(
api_key = "Your Portkey API Key",
provider = "openai",
)
}
]
assistant = AssistantAgent("assistant", llm_config={"config_list": config_list})
user_proxy = UserProxyAgent("user_proxy", code_execution_config={"work_dir": "coding", "use_docker": False}) # IMPORTANT: set to True to run code in docker, recommended
user_proxy.initiate_chat(assistant, message="Say this is also a test - part 2.")
# This initiates an automated chat between the two agents to solve the task
```
Notice that we updated the `base_url` to Portkey's AI Gateway and then added `default_headers` to enable Portkey specific features.
When we execute this script, it would yield the same results as without Portkey, but every request can now be inspected in the Portkey Analytics & Logs UI - including token, cost, accuracy calculations.
All the config parameters supported in Portkey are available for use as part of the headers. Let's look at some examples:
## Using 100+ models in Autogen through Portkey
Since Portkey [seamlessly connects to 150+ models across providers](/integrations/llms), you can easily connect any of these to now run with Autogen.
Let's see an example using **Mistral-7B on Anyscale** running with Autogen seamlessly:
```py theme={"system"}
from autogen import AssistantAgent, UserProxyAgent, config_list_from_json
# Import the portkey library to fetch helper functions
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders
config_list = [
{
"api_key": 'Your Anyscale API Key',
"model": "mistralai/Mistral-7B-Instruct-v0.1",
"base_url": PORTKEY_GATEWAY_URL,
"api_type": "openai", # Portkey conforms to the openai api_type
"default_headers": createHeaders(
api_key = "Your Portkey API Key",
provider = "anyscale",
)
}
]
assistant = AssistantAgent("assistant", llm_config={"config_list": config_list})
user_proxy = UserProxyAgent("user_proxy", code_execution_config={"work_dir": "coding", "use_docker": False}) # IMPORTANT: set to True to run code in docker, recommended
user_proxy.initiate_chat(assistant, message="Say this is also a test - part 2.")
# This initiates an automated chat between the two agents to solve the task
```
## Using a Virtual Key
[Virtual keys](/product/ai-gateway/virtual-keys) in Portkey allow you to easily switch between providers without manually having to store and change their API keys. Let's use the same Mistral example above, but this time using a Virtual Key.
```py theme={"system"}
from autogen import AssistantAgent, UserProxyAgent, config_list_from_json
# Import the portkey library to fetch helper functions
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders
config_list = [
{
# Set a dummy value, since we'll pick the API key from the virtual key
"api_key": 'X',
# Pick the model from the provider of your choice
"model": "mistralai/Mistral-7B-Instruct-v0.1",
"base_url": PORTKEY_GATEWAY_URL,
"api_type": "openai", # Portkey conforms to the openai api_type
"default_headers": createHeaders(
api_key = "Your Portkey API Key",
# Add your virtual key here
virtual_key = "Your Anyscale Virtual Key",
)
}
]
assistant = AssistantAgent("assistant", llm_config={"config_list": config_list})
user_proxy = UserProxyAgent("user_proxy", code_execution_config={"work_dir": "coding", "use_docker": False}) # IMPORTANT: set to True to run code in docker, recommended
user_proxy.initiate_chat(assistant, message="Say this is also a test - part 2.")
# This initiates an automated chat between the two agents to solve the task
```
## Using Configs
[Configs](/product/ai-gateway/configs) in Portkey unlock advanced management and routing functionality including [load balancing](/product/ai-gateway/load-balancing), [fallbacks](/product/ai-gateway/fallbacks), [canary testing](/product/ai-gateway/canary-testing), [switching models](/product/ai-gateway/universal-api) and more.
You can use Portkey configs in Autogen like this:
```py theme={"system"}
from autogen import AssistantAgent, UserProxyAgent, config_list_from_json
# Import the portkey library to fetch helper functions
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders
config_list = [
{
# Set a dummy value, since we'll pick the API key from the virtual key
"api_key": 'X',
# Pick the model from the provider of your choice
"model": "mistralai/Mistral-7B-Instruct-v0.1",
"base_url": PORTKEY_GATEWAY_URL,
"api_type": "openai", # Portkey conforms to the openai api_type
"default_headers": createHeaders(
api_key = "Your Portkey API Key",
# Add your Portkey config id
config = "Your Config ID",
)
}
]
assistant = AssistantAgent("assistant", llm_config={"config_list": config_list})
user_proxy = UserProxyAgent("user_proxy", code_execution_config={"work_dir": "coding", "use_docker": False}) # IMPORTANT: set to True to run code in docker, recommended
user_proxy.initiate_chat(assistant, message="Say this is also a test - part 2.")
# This initiates an automated chat between the two agents to solve the task
```
# Claude Code
Source: https://docs.portkey.ai/docs/integrations/libraries/claude-code
Integrate Portkey with Claude Code for enterprise-grade AI coding assistance with observability and governance
Claude Code is Anthropic's agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster through natural language commands.
With Portkey integration, you can enhance Claude Code with enterprise features:
* **Unified AI Gateway** - Route Claude Code through multiple providers (Anthropic, Bedrock, Vertex AI)
* **Centralized AI observability**: Real-time usage tracking for 40+ key metrics and logs for every request
* **Governance** - Real-time spend tracking, set budget limits and RBAC in your Claude Code setup
# 1. Setting up Portkey
Portkey allows you to use 1600+ LLMs with your Claude Code setup, with minimal configuration required. Let's set up the core components in Portkey that you'll need for integration.
### 2. Advanced Logs
Portkey's logging dashboard provides detailed logs for every request made by Claude Code. These logs include:
* Complete request and response tracking
* Code context and generation metrics
* Developer attribution
* Cost breakdown per coding session
### 3. Unified Access to 250+ LLMs
Easily switch between 250+ LLMs for different coding tasks. Use GPT-4 for complex architecture decisions, Claude for detailed code reviews, or specialized models for specific languages - all through a single interface.
### 4. Advanced Metadata Tracking
Track coding patterns and productivity metrics with custom metadata:
* Language and framework usage
* Code generation vs completion tasks
* Time-of-day productivity patterns
* Project-specific metrics
### 2. Advanced Logs
Portkey's logging dashboard provides detailed logs for every request made by Cline. These logs include:
* Complete request and response tracking
* Code context and generation metrics
* Developer attribution
* Cost breakdown per coding session
### 3. Unified Access to 250+ LLMs
Easily switch between 250+ LLMs for different coding tasks. Use GPT-4 for complex architecture decisions, Claude for detailed code reviews, or specialized models for specific languages - all through a single interface.
### 4. Advanced Metadata Tracking
Track coding patterns and productivity metrics with custom metadata:
* Language and framework usage
* Code generation vs completion tasks
* Time-of-day productivity patterns
* Project-specific metrics
### 2. Advanced Logs
Portkey's logging dashboard provides detailed logs for every request made to your LLMs. These logs include:
* Complete request and response tracking
* Metadata tags for filtering
* Cost attribution and much more...
### 3. Unified Access to 250+ LLMs
You can easily switch between 250+ LLMs. Call various LLMs such as Anthropic, Gemini, Mistral, Azure OpenAI, Google Vertex AI, AWS Bedrock, and many more by simply changing the `virtual_key` in your default `config` object.
### 4. Advanced Metadata Tracking
Using Portkey, you can add custom metadata to your LLM requests for detailed tracking and analytics. Use metadata tags to filter logs, track usage, and attribute costs across departments and teams.
# 3. Set Up Enterprise Governance for Cursor
**Why Enterprise Governance?**
If you are using Cursor inside your orgnaization, you need to consider several governance aspects:
* **Cost Management**: Controlling and tracking AI spending across teams
* **Access Control**: Managing team access and workspaces
* **Usage Analytics**: Understanding how AI is being used across the organization
* **Security & Compliance**: Maintaining enterprise security standards
* **Reliability**: Ensuring consistent service across all users
* **Model Management**: Managing what models are being used in your setup
Portkey adds a comprehensive governance layer to address these enterprise
**Enterprise Implementation Guide**
### 2. Advanced Logs
Portkey's logging dashboard provides detailed logs for every request made to your LLMs. These logs include:
* Complete request and response tracking
* Metadata tags for filtering
* Cost attribution and much more...
### 3. Unified Access to 1600+ LLMs
You can easily switch between 1600+ LLMs. Call various LLMs such as Anthropic, Gemini, Mistral, Azure OpenAI, Google Vertex AI, AWS Bedrock, and many more by simply changing the `virtual key` in your default `config` object.
### 4. Advanced Metadata Tracking
Using Portkey, you can add custom metadata to your LLM requests for detailed tracking and analytics. Use metadata tags to filter logs, track usage, and attribute costs across departments and teams.
* `Request Details`: Information about the specific request, including the model used, input, and output.
* `Metrics`: Performance metrics such as latency, token usage, and cost.
* `Logs`: Detailed logs of the request, including any errors or warnings.
* `Traces`: A visual representation of the request flow, especially useful for complex DSPy modules.
## Portkey Features with DSPy
### 1. Interoperability
Portkey's Unified API enables you to easily switch between **250**+ language models. Simply change the provider slug and model name in your model string:
### 3. Metrics
Portkey's Observability suite helps you track key metrics like **cost** and **token** usage, which is crucial for managing the high cost of DSPy operations. The observability dashboard helps you track 40+ key metrics, giving you detailed insights into your DSPy runs.
### 4. Advanced Configuration
While the basic setup is simple, you can still access advanced Portkey features by configuring them in your Config or through the Portkey dashboard:
* **Caching**: Enable semantic or simple caching to reduce costs
* **Fallbacks**: Set up automatic fallbacks between providers
* **Load Balancing**: Distribute requests across multiple API keys
* **Retries**: Configure automatic retry logic
* **Rate Limiting**: Set rate limits for your API usage
These features are configured at the Virtual Key level in your Portkey dashboard, keeping your DSPy code clean and simple.
## Advanced Example: RAG with DSPy and Portkey
Here's a complete example showing how to build a RAG system with DSPy and Portkey:
```python theme={"system"}
import dspy
# Configure Portkey-enabled LM
lm = dspy.LM(
"openai/@openai-provider-slug/gpt-4o",
api_key="YOUR_PORTKEY_API_KEY",
api_base="https://api.portkey.ai/v1"
)
dspy.configure(lm=lm)
# Define a retrieval function
def search_wikipedia(query: str) -> list[str]:
results = dspy.ColBERTv2(url="http://20.102.90.50:2017/wiki17_abstracts")(query, k=3)
return [x["text"] for x in results]
# Create a RAG chain
rag = dspy.ChainOfThought("context, question -> response")
# Ask a question
question = "What's the name of the castle that David Gregory inherited?"
result = rag(context=search_wikipedia(question), question=question)
print(result)
# Output: Prediction(
# reasoning='David Gregory inherited Kinnairdy Castle in 1664, as mentioned in the context provided.',
# response='The name of the castle that David Gregory inherited is Kinnairdy Castle.'
# )
```
## Troubleshooting
@openai-dev/gpt-4o)
You’ll see a list of providers.
portkey-model
* **Display name**: e.g., Custom Portkey Model
* **API endpoint URL**: [https://api.portkey.ai/v1/chat/completions](https://api.portkey.ai/v1/chat/completions)
* **Capabilities**: enable Tools, Vision, and Thinking (as needed for your use)
* **Maximum context tokens**: use your provider’s documented limit; keep defaults if unsure
* **Maximum output tokens**: set per your usage; adjust later if needed
After saving, you should see your **Custom Portkey Model** in Copilot’s model list.
You can now use your Portkey-routed model in Copilot chat.
### 2. Advanced Logs
Portkey's logging dashboard provides detailed logs for every request made by Github Copilot Chat. These logs include:
* Complete request and response tracking for debugging
* Metadata tags for filtering by team or project
* Cost attribution per task
* Complete conversation history with the AI agent
### 3. Unified Access to 1600+ LLMs
You can easily switch between 1600+ LLMs. Call various LLMs such as Anthropic, Gemini, Mistral, Azure OpenAI, Google Vertex AI, AWS Bedrock, and many more by simply changing the `virtual key` in your default `config` object.
### 4. Advanced Metadata Tracking
Using Portkey, you can add custom metadata to your LLM requests for detailed tracking and analytics. Use metadata tags to filter logs, track usage, and attribute costs across engineering teams and projects.
[https://api.portkey.ai/v1/chat/completions](https://api.portkey.ai/v1/chat/completions) for OpenAI-compatible chat completions.
### 2. Advanced Logs
Portkey's logging dashboard provides detailed logs for every request made by Goose. These logs include:
* Complete request and response tracking for debugging
* Metadata tags for filtering by team or project
* Cost attribution per task
* Complete conversation history with the AI agent
### 3. Unified Access to 1600+ LLMs
You can easily switch between 1600+ LLMs. Call various LLMs such as Anthropic, Gemini, Mistral, Azure OpenAI, Google Vertex AI, AWS Bedrock, and many more by simply changing the `virtual key` in your default `config` object.
### 4. Advanced Metadata Tracking
Using Portkey, you can add custom metadata to your LLM requests for detailed tracking and analytics. Use metadata tags to filter logs, track usage, and attribute costs across engineering teams and projects.
### 2. Advanced Logs
Portkey's logging dashboard provides detailed logs for every request made to your LLMs. These logs include:
* Complete request and response tracking
* Metadata tags for filtering
* Cost attribution and much more...
### 3. Unified Access to 1600+ LLMs
You can easily switch between 1600+ LLMs. Call various LLMs such as Anthropic, Gemini, Mistral, Azure OpenAI, Google Vertex AI, AWS Bedrock, and many more by simply changing the `virtual key` in your default `config` object.
### 4. Advanced Metadata Tracking
Using Portkey, you can add custom metadata to your LLM requests for detailed tracking and analytics. Use metadata tags to filter logs, track usage, and attribute costs across departments and teams.
## Using Virtual Keys for Multiple Models
Portkey supports [Virtual Keys](/product/ai-gateway/virtual-keys) which are an easy way to store and manage API keys in a secure vault. Lets try using a Virtual Key to make LLM calls.
#### 1. Create a Virtual Key in your Portkey account and the id
Let's try creating a new Virtual Key for Mistral like this
#### 2. Use Virtual Keys in the Portkey Headers
The `virtualKey` parameter sets the authentication and provider for the AI provider being used. In our case we're using the Mistral Virtual key.
This is extremely powerful since we gain control and visibility over the agent flows so we can identify problems and make updates as needed.
# Langchain (Python)
Source: https://docs.portkey.ai/docs/integrations/libraries/langchain-python
Supercharge Langchain apps with Portkey: Multi-LLM, observability, caching, reliability, and prompt management.
This setup enables Portkey's advanced features for Langchain.
## Key Portkey Features for Langchain
Routing Langchain requests via Portkey's `ChatOpenAI` interface unlocks powerful capabilities:
Use `ChatOpenAI` for OpenAI, Anthropic, Gemini, Mistral, and more. Switch providers easily with Virtual Keys or Configs.
Reduce latency and costs with Portkey's Simple, Semantic, or Hybrid caching, enabled via Configs.
Build robust apps with retries, timeouts, fallbacks, and load balancing, configured in Portkey.
Get deep insights: LLM usage, costs, latency, and errors are automatically logged in Portkey.
Manage, version, and use prompts from Portkey's Prompt Library within Langchain.
Securely manage LLM provider API keys using Portkey Virtual Keys in your Langchain setup.
***
## 5. Prompt Management
Portkey's Prompt Library helps manage prompts effectively:
* **Version Control:** Store and track prompt changes.
* **Parameterized Prompts:** Use variables with [mustache templating](/product/prompt-library/prompt-templates#templating-engine).
* **Sandbox:** Test prompts with different LLMs in Portkey.
### Using Portkey Prompts in Langchain
1. Create prompt in Portkey, get `Prompt ID`.
2. Use Portkey SDK to render prompt with variables.
3. Transform rendered prompt to Langchain message format.
4. Pass messages to Portkey-configured `ChatOpenAI`.
```python theme={"system"}
import os
from langchain_openai import ChatOpenAI
from langchain_core.messages import SystemMessage, HumanMessage
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders, Portkey
PORTKEY_API_KEY = os.environ.get("PORTKEY_API_KEY")
client = Portkey(api_key=PORTKEY_API_KEY)
PROMPT_ID = "pp-story-generator" # Your Portkey Prompt ID
rendered_prompt = client.prompts.render(
prompt_id=PROMPT_ID,
variables={"character": "brave knight", "object": "magic sword"}
).data
langchain_messages = []
if rendered_prompt and rendered_prompt.prompt:
for msg in rendered_prompt.prompt:
if msg.get("role") == "user": langchain_messages.append(HumanMessage(content=msg.get("content")))
elif msg.get("role") == "system": langchain_messages.append(SystemMessage(content=msg.get("content")))
portkey_headers = createHeaders(api_key=PORTKEY_API_KEY, provider="@openai")
llm_portkey_prompt = ChatOpenAI(
api_key="placeholder_key",
base_url=PORTKEY_GATEWAY_URL,
default_headers=portkey_headers,
model=rendered_prompt.model if rendered_prompt and rendered_prompt.model else "gpt-4o"
)
# if langchain_messages: response = llm_portkey_prompt.invoke(langchain_messages)
```
Manage prompts centrally in Portkey for versioning and collaboration.
***
## 6. Secure Virtual Keys
Portkey's [Virtual Keys](/product/ai-gateway/virtual-keys) are vital for secure, flexible LLM ops with Langchain.
**Benefits:**
* **Secure Credentials:** Store provider API keys in Portkey's vault. Code uses Virtual Key IDs.
* **Easy Configuration:** Switch providers/keys by changing `virtual_key` in `createHeaders`.
* **Access Control:** Manage Virtual Key permissions in Portkey.
* **Auditability:** Track usage via Portkey logs.
Using Virtual Keys boosts security and simplifies config management.
***
## Langchain Embeddings
Create embeddings with `OpenAIEmbeddings` via Portkey.
```python theme={"system"}
from langchain_openai import OpenAIEmbeddings
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders
import os
PORTKEY_API_KEY = os.environ.get("PORTKEY_API_KEY")
portkey_headers = createHeaders(api_key=PORTKEY_API_KEY, provider="@openai")
embeddings_model = OpenAIEmbeddings(
api_key="placeholder_key",
base_url=PORTKEY_GATEWAY_URL,
default_headers=portkey_headers,
model="text-embedding-3-small"
)
# embeddings = embeddings_model.embed_documents(["Hello world!", "Test."])
```
### 2. Advanced Logs
Portkey's logging dashboard provides detailed logs for every request made to your LLMs. These logs include:
* Complete request and response tracking
* Metadata tags for filtering
* Cost attribution and much more...
### 3. Unified Access to 1600+ LLMs
You can easily switch between 1600+ LLMs. Call various LLMs such as Anthropic, Gemini, Mistral, Azure OpenAI, Google Vertex AI, AWS Bedrock, and many more by simply changing the `virtual key` in your default `config` object.
### 4. Advanced Metadata Tracking
Using Portkey, you can add custom metadata to your LLM requests for detailed tracking and analytics. Use metadata tags to filter logs, track usage, and attribute costs across departments and teams.
### 2. Advanced Logs
Portkey's logging dashboard provides detailed logs for every request made to your LLMs. These logs include:
* Complete request and response tracking
* Metadata tags for filtering
* Cost attribution and much more...
### 3. Unified Access to 1600+ LLMs
You can easily switch between 1600+ LLMs. Call various LLMs such as Anthropic, Gemini, Mistral, Azure OpenAI, Google Vertex AI, AWS Bedrock, and many more by simply changing the `virtual key` in your default `config` object.
### 4. Advanced Metadata Tracking
Using Portkey, you can add custom metadata to your LLM requests for detailed tracking and analytics. Use metadata tags to filter logs, track usage, and attribute costs across departments and teams.
Call various LLMs like Anthropic, Gemini, Mistral, Azure OpenAI, Google Vertex AI, and AWS Bedrock with minimal code changes.
Speed up your requests and save money on LLM calls by storing past responses in the Portkey cache. Choose between Simple and Semantic cache modes.
Set up fallbacks between different LLMs or providers, load balance your requests across multiple instances or API keys, set automatic retries, and request timeouts.
Portkey automatically logs all the key details about your requests, including cost, tokens used, response time, request and response bodies, and more. Send custom metadata and trace IDs for better analytics and debugging.
Use Portkey as a centralized hub to store, version, and experiment with prompts across multiple LLMs, and seamlessly retrieve them in your LlamaIndex app for easy integration.
Improve your LlamaIndex app by capturing qualitative & quantitative user feedback on your requests.
Set budget limits on provider API keys and implement fine-grained user roles and permissions for both the app and the Portkey APIs.
## Overriding a Saved Config
If you want to use a saved Config from the Portkey app in your LlamaIndex code but need to modify certain parts of it before making a request, you can easily achieve this using Portkey's Configs API. This approach allows you to leverage the convenience of saved Configs while still having the flexibility to adapt them to your specific needs.
#### Here's an example of how you can fetch a saved Config using the Configs API and override the `model` parameter:
```py Overriding Model in a Saved Config theme={"system"}
from llama_index.llms.openai import OpenAI
from llama_index.core.llms import ChatMessage
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders
import requests
import os
def create_config(config_slug,model):
url = f'https://api.portkey.ai/v1/configs/{config_slug}'
headers = {
'x-portkey-api-key': os.environ.get("PORTKEY_API_KEY"),
'content-type': 'application/json'
}
response = requests.get(url, headers=headers).json()
config = json.loads(response['config'])
config['override_params']['model']=model
return config
config=create_config("pc-llamaindex-xx","gpt-4-turbo")
portkey = OpenAI(
api_base=PORTKEY_GATEWAY_URL,
api_key="xx" # Placeholder, no need to set
default_headers=createHeaders(
api_key=os.environ.get("PORTKEY_API_KEY"),
config=config
)
)
messages = [ChatMessage(role="user", content="1729")]
resp = portkey.chat(messages)
print(resp)
```
In this example:
1. We define a helper function `get_customized_config` that takes a `config_slug` and a `model` as parameters.
2. Inside the function, we make a GET request to the Portkey Configs API endpoint to fetch the saved Config using the provided `config_slug`.
3. We extract the `config` object from the API response.
4. We update the `model` parameter in the `override_params` section of the Config with the provided `custom_model`.
5. Finally, we return the customized Config.
We can then use this customized Config when initializing the OpenAI client from LlamaIndex, ensuring that our specific `model` override is applied to the saved Config.
For more details on working with Configs in Portkey, refer to the [**Config documentation**.](/product/ai-gateway/configs)
***
## 1. Interoperability - Calling Anthropic, Gemini, Mistral, and more
Now that we have the OpenAI code up and running, let's see how you can use Portkey to send the request across multiple LLMs - we'll show **Anthropic**, **Gemini**, and **Mistral**. For the full list of providers & LLMs supported, check out [**this doc**](/guides/integrations).
Switching providers just requires **changing 3 lines of code:**
1. Change the `provider name`
2. Change the `API key`, and
3. Change the `model name`
[**Check out Observability docs here.**](/product/observability)
## 5. Prompt Management
Portkey features an advanced Prompts platform tailor-made for better prompt engineering. With Portkey, you can:
* **Store Prompts with Access Control and Version Control:** Keep all your prompts organized in a centralized location, easily track changes over time, and manage edit/view permissions for your team.
* **Parameterize Prompts**: Define variables and [mustache-approved tags](/product/prompt-library/prompt-templates#templating-engine) within your prompts, allowing for dynamic value insertion when calling LLMs. This enables greater flexibility and reusability of your prompts.
* **Experiment in a Sandbox Environment**: Quickly iterate on different LLMs and parameters to find the optimal combination for your use case, without modifying your LlamaIndex code.
#### Here's how you can leverage Portkey's Prompt Management in your LlamaIndex application:
1. Create your prompt template on the Portkey app, and save it to get an associated `Prompt ID`
2. Before making a Llamaindex request, render the prompt template using the Portkey SDK
3. Transform the retrieved prompt to be compatible with LlamaIndex and send the request!
#### Example: Using a Portkey Prompt Template in LlamaIndex
```py Portkey Prompts in LlamaIndex theme={"system"}
import json
import os
from llama_index.llms.openai import OpenAI
from llama_index.core.llms import ChatMessage
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders, Portkey
### Initialize Portkey client with API key
client = Portkey(api_key=os.environ.get("PORTKEY_API_KEY"))
### Render the prompt template with your prompt ID and variables
prompt_template = client.prompts.render(
prompt_id="pp-prompt-id",
variables={ "movie":"Dune 2" }
).data.dict()
config = {
"provider:"@GROQ_PROVIDER", # You need to send the virtual key separately
"override_params":{
"model":prompt_template["model"], # Set the model name based on the value in the prompt template
"temperature":prompt_template["temperature"] # Similarly, you can also set other model params
}
}
portkey = OpenAI(
api_base=PORTKEY_GATEWAY_URL,
api_key="xx" # Placeholder, no need to set
default_headers=createHeaders(
api_key=os.environ.get("PORTKEY_API_KEY"),
config=config
)
)
### Transform the rendered prompt into LlamaIndex-compatible format
messages = [ChatMessage(content=msg["content"], role=msg["role"]) for msg in prompt_template["messages"]]
resp = portkey.chat(messages)
print(resp)
```
[**Explore Prompt Management docs here**](/product/prompt-library).
***
## 6. Continuous Improvement
Now that you know how to trace & log your Llamaindex requests to Portkey, you can also start capturing user feedback to improve your app!
You can append qualitative as well as quantitative feedback to any `trace ID` with the `portkey.feedback.create` method:
```py Adding Feedback theme={"system"}
from portkey_ai import Portkey
portkey = Portkey(
api_key="PORTKEY_API_KEY"
)
feedback = portkey.feedback.create(
trace_id="YOUR_LLAMAINDEX_TRACE_ID",
value=5, # Integer between -10 and 10
weight=1, # Optional
metadata={
# Pass any additional context here like comments, _user and more
}
)
print(feedback)
```
[**Check out the Feedback documentation for a deeper dive**](/product/observability/feedback).
## 7. Security & Compliance
When you onboard more team members to help out on your Llamaindex app - permissioning, budgeting, and access management can become a mess! Using Portkey, you can set **budget limits** on provide API keys and implement **fine-grained user roles** and **permissions** to:
* **Control access**: Restrict team members' access to specific features, Configs, or API endpoints based on their roles and responsibilities.
* **Manage costs**: Set budget limits on API keys to prevent unexpected expenses and ensure that your LLM usage stays within your allocated budget.
* **Ensure compliance**: Implement strict security policies and audit trails to maintain compliance with industry regulations and protect sensitive data.
* **Simplify onboarding**: Streamline the onboarding process for new team members by assigning them appropriate roles and permissions, eliminating the need to share sensitive API keys or secrets.
* **Monitor usage**: Gain visibility into your team's LLM usage, track costs, and identify potential security risks or anomalies through comprehensive monitoring and reporting.
[**Read more about Portkey's Security & Enterprise offerings here**](/product/enterprise-offering).
## Join Portkey Community
Join the Portkey Discord to connect with other practitioners, discuss your LlamaIndex projects, and get help troubleshooting your queries.
[**Link to Discord**](https://portkey.ai/community)
For more detailed information on each feature and how to use them, please refer to the [Portkey Documentation](https://portkey.ai/docs).
# Microsoft Semantic Kernel
Source: https://docs.portkey.ai/docs/integrations/libraries/microsoft-semantic-kernel
# MindsDb
Source: https://docs.portkey.ai/docs/integrations/libraries/mindsdb
Integrate MindsDB with Portkey to build enterprise-grade AI use-cases
MindsDB connects to various data sources and LLMs, bringing data and AI together for easy AI automation.
With Portkey, you can run MindsDB AI systems with 250+ LLMs and implement enterprise-grade features like [LLM observability](/product/observability), [caching](/product/ai-gateway/cache-simple-and-semantic), [advanced routing](/product/ai-gateway), and more to build production-grade MindsDB AI apps.
## Prerequisites
Before proceeding, ensure the following prerequisites are met:
1. Install MindsDB locally via [Docker](https://docs.mindsdb.com/setup/self-hosted/docker) or [Docker Desktop](https://docs.mindsdb.com/setup/self-hosted/docker-desktop).
2. To use Portkey within MindsDB, install the required dependencies following [this instruction](https://docs.mindsdb.com/setup/self-hosted/docker#install-dependencies).
3. Obtain the [Portkey API key](https://app.portkey.ai) required to deploy and use Portkey within MindsDB.
## Setup
3. In your workflow, configure the OpenAI node to use your preferred model
* The model parameter in your config will override the default model in your n8n workflow
### 2. Advanced Logs
Portkey's logging dashboard provides detailed logs for every request made to your LLMs. These logs include:
* Complete request and response tracking
* Metadata tags for filtering
* Cost attribution and much more...
### 3. Unified Access to 1600+ LLMs
You can easily switch between 1600+ LLMs. Call various LLMs such as Anthropic, Gemini, Mistral, Azure OpenAI, Google Vertex AI, AWS Bedrock, and many more by simply changing the `virtual_key` in your default `config` object.
### 4. Advanced Metadata Tracking
Using Portkey, you can add custom metadata to your LLM requests for detailed tracking and analytics. Use metadata tags to filter logs, track usage, and attribute costs across departments and teams.
Traces provide a hierarchical view of your agent's execution, showing the sequence of LLM calls, tool invocations, and state transitions.
```typescript theme={"system"}
// Add tracing to your OpenAI Agents
const portkey = new OpenAI({
baseURL: "https://api.portkey.ai/v1",
apiKey: process.env.PORTKEY_API_KEY!,
defaultHeaders: {
"x-portkey-trace-id": "unique_execution_trace_id", // Add unique trace ID
}
});
setDefaultOpenAIClient(portkey);
```
Portkey logs every interaction with LLMs, including:
* Complete request and response payloads
* Latency and token usage metrics
* Cost calculations
* Tool calls and function executions
All logs can be filtered by metadata, trace IDs, models, and more, making it easy to debug specific agent runs.
Portkey provides built-in dashboards that help you:
* Track cost and token usage across all agent runs
* Analyze performance metrics like latency and success rates
* Identify bottlenecks in your agent workflows
* Compare different agent configurations and LLMs
You can filter and segment all metrics by custom metadata to analyze specific agent types, user groups, or use cases.
Add custom metadata to your OpenAI agent calls to enable powerful filtering and segmentation:
```typescript theme={"system"}
// Add metadata to your OpenAI Agents
const portkey = new OpenAI({
baseURL: "https://api.portkey.ai/v1",
apiKey: process.env.PORTKEY_API_KEY!,
defaultHeaders: {
"x-portkey-metadata": JSON.stringify({"agent_type": "research_agent"}), // Add custom metadata
}
});
setDefaultOpenAIClient(portkey);
```
This metadata can be used to filter logs, traces, and metrics on the Portkey dashboard, allowing you to analyze specific agent runs, users, or environments.
This enables:
* Per-user cost tracking and budgeting
* Personalized user analytics
* Team or organization-level metrics
* Environment-specific monitoring (staging vs. production)
Official OpenAI Agents SDK documentation
Example implementations for various use cases
Get personalized guidance on implementing this integration
```python theme={"system"}
# Add tracing to your OpenAI Agents
from openai import AsyncOpenAI
from agents import set_default_openai_client
import os
portkey = AsyncOpenAI(
base_url="https://api.portkey.ai/v1",
api_key=os.environ["PORTKEY_API_KEY"],
default_headers={
"x-portkey-trace-id": "unique_execution_trace_id", # Add unique trace ID
"x-portkey-provider": "@your-openai-provider-slug",
}
)
set_default_openai_client(portkey)
```
Portkey logs every interaction with LLMs, including:
* Complete request and response payloads
* Latency and token usage metrics
* Cost calculations
* Tool calls and function executions
All logs can be filtered by metadata, trace IDs, models, and more, making it easy to debug specific agent runs.
Portkey provides built-in dashboards that help you:
* Track cost and token usage across all agent runs
* Analyze performance metrics like latency and success rates
* Identify bottlenecks in your agent workflows
* Compare different agent configurations and LLMs
You can filter and segment all metrics by custom metadata to analyze specific agent types, user groups, or use cases.
Add custom metadata to your OpenAI agent calls to enable powerful filtering and segmentation:
```python theme={"system"}
from openai import AsyncOpenAI
from agents import set_default_openai_client
import os, json
portkey = AsyncOpenAI(
base_url="https://api.portkey.ai/v1",
api_key=os.environ["PORTKEY_API_KEY"],
default_headers={
"x-portkey-metadata": json.dumps({"agent_type": "research_agent"}),
"x-portkey-provider": "@your-openai-provider-slug",
}
)
set_default_openai_client(portkey)
```
This metadata can be used to filter logs, traces, and metrics on the Portkey dashboard, allowing you to analyze specific agent runs, users, or environments.
This enables:
* Per-user cost tracking and budgeting
* Personalized user analytics
* Team or organization-level metrics
* Environment-specific monitoring (staging vs. production)
Official OpenAI Agents SDK documentation
Example implementations for various use cases
Get personalized guidance on implementing this integration
### 2. Advanced Logs
Portkey's logging dashboard provides detailed logs for every request made to your LLMs. These logs include:
* Complete request and response tracking
* Metadata tags for filtering
* Cost attribution and much more...
### 3. Unified Access to 250+ LLMs
You can easily switch between 250+ LLMs. Call various LLMs such as Anthropic, Gemini, Mistral, Azure OpenAI, Google Vertex AI, AWS Bedrock, and many more by simply changing the `virtual key` in your default `config` object.
### 4. Advanced Metadata Tracking
Using Portkey, you can add custom metadata to your LLM requests for detailed tracking and analytics. Use metadata tags to filter logs, track usage, and attribute costs across departments and teams.
2. Click **Create Provider** (if this is your first time using Portkey).
3. Select **Create New Integration** → choose your AI service (OpenAI, Anthropic, etc.).
4. Enter your provider’s API key and required details.
5. *(Optional)* Configure workspace and model provisioning.
6. Click **Create Integration**.
Click **Save** to finish.
**Captured metadata includes:**
* **User email** – Appears in the User column and metadata (via the special `_user` field)
* **User name** – Full name from OpenWebUI user profile
* **User role** – Admin, user, or custom roles for access control
* **Chat ID** – Track conversations and session context
* **User ID** – Unique identifier for programmatic filtering
This rich metadata enables you to filter logs by specific users, attribute costs to departments, and maintain complete audit trails—all without requiring individual API keys per user.
***
### How the Manifold Pipe Solves Enterprise Attribution
The manifold pipe bridges OpenWebUI's user context with Portkey's observability, solving a critical enterprise pain point:
Leverage Portkey Configs to enforce provider- or model-specific guardrails without editing your OpenWebUI setup.
### 2. Advanced Logs
Portkey's logging dashboard provides detailed logs for every request made to your LLMs. These logs include:
* Complete request and response tracking
* Metadata tags for filtering
* Cost attribution and much more...
### 3. Unified Access to 1600+ LLMs
You can easily switch between 1600+ LLMs. Call various LLMs such as Anthropic, Gemini, Mistral, Azure OpenAI, Google Vertex AI, AWS Bedrock, and many more by simply changing the `model slug` in your default `config` object.
### 4. Advanced Metadata Tracking
Using Portkey, you can add custom metadata to your LLM requests for detailed tracking and analytics. Use metadata tags to filter logs, track usage, and attribute costs across departments and teams.
***
## 4. Avoid Promptfoo Rate Limits & Leverage Cache
Since promptfoo can make a lot of calls very quickly, you can use a loadbalanced config in Portkey with cache enabled. You can pass the config header in the same YAML.
Here's a sample Config that you can save in the Portkey UI and get a respective config slug:
```json theme={"system"}
{
"cache": { "mode": "simple" },
"strategy": { "mode": "loadbalance" },
"targets": [
{ "provider":"@ACCOUNT_ONE" },
{ "provider":"@ACCOUNT_TWO" },
{ "provider":"@ACCOUNT_THREE" }
]
}
```
And then we can just add the saved Config's slug in the YAML:
```yaml theme={"system"}
providers:
id: portkey:claude-3-opus20240229
config:
portkeyConfig: PORTKEY_CONFIG_SLUG
```
***
##
### 2. Advanced Logs
Portkey's logging dashboard provides detailed logs for every request made by Roo. These logs include:
* Complete request and response tracking
* Code context and generation metrics
* Developer attribution
* Cost breakdown per coding session
### 3. Unified Access to 250+ LLMs
Easily switch between 250+ LLMs for different coding tasks. Use GPT-4 for complex architecture decisions, Claude for detailed code reviews, or specialized models for specific languages - all through a single interface.
### 4. Advanced Metadata Tracking
Track coding patterns and productivity metrics with custom metadata:
* Language and framework usage
* Code generation vs completion tasks
* Time-of-day productivity patterns
* Project-specific metrics
### Reliability
Portkey enhances the robustness of your AI applications with built-in features such as [Caching](/product/ai-gateway/cache-simple-and-semantic), [Fallback](/product/ai-gateway/fallbacks) mechanisms, [Load balancing](/product/ai-gateway/load-balancing), [Conditional routing](/product/ai-gateway/conditional-routing), [Request timeouts](/product/ai-gateway/request-timeouts), etc.
Here is how you can modify your config to include the following Portkey features-
### 2. Advanced Logs
Portkey's logging dashboard provides detailed logs for every request made to your LLMs. These logs include:
* Complete request and response tracking
* Metadata tags for filtering
* Cost attribution and much more...
### 3. Unified Access to 1600+ LLMs
You can easily switch between 1600+ LLMs. Call various LLMs such as Anthropic, Gemini, Mistral, Azure OpenAI, Google Vertex AI, AWS Bedrock, and many more by simply changing the `virtual key` in your default `config` object.
### 4. Advanced Metadata Tracking
Using Portkey, you can add custom metadata to your LLM requests for detailed tracking and analytics. Use metadata tags to filter logs, track usage, and attribute costs across departments and teams.
1. Click **"Add Key"** and enable the **"Local/Privately hosted provider"** toggle
2. Configure your deployment:
* Select the matching provider API specification (typically `OpenAI`)
* Enter your model's base URL in the `Custom Host` field
* Add required authentication headers and their values
3. Click **"Create"** to generate your virtual key
#### Step 2: Use Your Virtual Key in Requests
After creating your virtual key, you can use it in your applications:
## Troubleshooting
| Issue | Possible Causes | Solutions |
| ----------------------- | --------------------------------------------------- | ------------------------------------------------------------------------------ |
| Connection Errors | Incorrect URL, network issues, firewall rules | Verify URL format, check network connectivity, confirm firewall allows traffic |
| Authentication Failures | Invalid credentials, incorrect header format | Check credentials, ensure headers are correctly formatted and forwarded |
| Timeout Errors | LLM server overloaded, request too complex | Adjust timeout settings, implement load balancing, simplify requests |
| Inconsistent Responses | Different model versions, configuration differences | Standardize model versions, document expected behavior differences |
## FAQs
In addition to the `user` parameter, Portkey allows you to send arbitrary custom metadata with your requests. This powerful feature enables you to associate additional context or information with each request, which can be useful for analysis, debugging, or other custom use cases.
## Frequently Asked Questions
Key features:
* Automated evaluation scores for each model response
* Detailed trace analysis with quality metrics
* Comparison views across different models
### Portkey Dashboard - Operational View
Access your Portkey dashboard to see operational metrics for all API calls:
Key metrics:
* **Unified Logs**: Single view of all requests across providers
* **Cost Tracking**: Automatic cost calculation for every call
* **Latency Monitoring**: Response time comparisons across models
* **Token Usage**: Detailed token consumption analytics
## Advanced Use Cases
### Complex Agentic Workflows
The integration supports tracing complex workflows where you chain multiple LLM calls:
```python theme={"system"}
# Example: E-commerce assistant with multiple LLM calls
async def ecommerce_assistant_workflow(user_query):
# Step 1: Intent classification
intent = await classify_intent(user_query)
# Step 2: Product search
products = await search_products(intent)
# Step 3: Generate response
response = await generate_response(products, user_query)
# All steps are automatically traced and evaluated
return response
```
### CI/CD Integration
Leverage this integration in your CI/CD pipelines for:
* **Automated Model Testing**: Run evaluation suites on new model versions
* **Quality Gates**: Set thresholds for evaluation scores before deployment
* **Performance Monitoring**: Track degradation in model quality over time
* **Cost Optimization**: Monitor and alert on cost spikes
## Benefits
## Migration Guide
If you're already using HoneyHive with OpenAI, migrating to use Portkey is simple:
## Migration Guide
If you're already using Langfuse with OpenAI, migrating to use Portkey is simple:
## Migration Guide
If you're already using LangSmith with OpenAI, migrating to use Portkey is simple:
# MLflow Tracing
Source: https://docs.portkey.ai/docs/integrations/tracing-providers/ml-flow
Enhance LLM observability with automatic tracing and intelligent gateway routing
[MLflow Tracing](https://mlflow.org/docs/latest/llms/tracing/index.html) is a feature that enhances LLM observability in your Generative AI (GenAI) applications by capturing detailed information about the execution of your application's services. Tracing provides a way to record the inputs, outputs, and metadata associated with each intermediate step of a request, enabling you to easily pinpoint the source of bugs and unexpected behaviors.
# OpenLIT
Source: https://docs.portkey.ai/docs/integrations/tracing-providers/openlit
Simplify AI development with OpenTelemetry-native observability and intelligent gateway routing
[OpenLIT](https://openlit.io/) allows you to simplify your AI development workflow, especially for Generative AI and LLMs. It streamlines essential tasks like experimenting with LLMs, organizing and versioning prompts, and securely handling API keys. With just one line of code, you can enable OpenTelemetry-native observability, offering full-stack monitoring that includes LLMs, vector databases, and GPUs.
# OpenTelemetry Python SDK
Source: https://docs.portkey.ai/docs/integrations/tracing-providers/opentelemetry-python-sdk
Direct OpenTelemetry instrumentation with full control over traces and intelligent gateway routing
The [OpenTelemetry SDK](https://opentelemetry.io/docs/languages/python/) provides direct, fine-grained control over instrumentation in your LLM applications. Unlike automatic instrumentation libraries, the SDK allows you to manually create spans and set attributes exactly where and how you need them.
# Phoenix(Arize) Open-Telemetry
Source: https://docs.portkey.ai/docs/integrations/tracing-providers/phoenix
AI observability and debugging platform with OpenInference instrumentation and intelligent gateway routing
[Arize Phoenix](https://phoenix.arize.com/) is an open-source AI observability platform designed to help developers debug, monitor, and evaluate LLM applications. Phoenix provides powerful visualization tools and uses OpenInference instrumentation to automatically capture detailed traces of your AI system's behavior.
# Traceloop (OpenLLMetry)
Source: https://docs.portkey.ai/docs/integrations/tracing-providers/traceloop
[Traceloop's OpenLLMetry](https://www.traceloop.com/docs/openllmetry/introduction) is an open source project that allows you to easily start monitoring and debugging the execution of your LLM app.
# Milvus
Source: https://docs.portkey.ai/docs/integrations/vector-databases/milvus
[Milvus](https://milvus.io/) is an open-source vector database built for GenAI applications.
It is built to be performant and scale to tens of billions of vectors with minimal performance loss.
Portkey provides a proxy to Milvus - you can log your Milvus requests and manage auth for Qdrant clusters on Portkey.
## Portkey SDK Integration with Milvus
Portkey provides a consistent API to interact with models from various providers. To integrate Milvus with Portkey:
### 1. Install the Portkey SDK
Add the Portkey SDK to your application to interact with Milvus through Portkey's gateway.
### Through the API
You can also create keys programmatically:
Every administrative action is recorded with:
* User identity
* Action type and target resource
* Timestamp
* IP address
* Request details
This audit trail helps maintain compliance and provides accountability for all administrative changes.
Based on your access level, you might see the relevant permissions on the API key modal - tick the ones you'd like, name your API key, and save it.
## Chat Completions Example
Save your Azure OpenAI details [on Portkey](/integrations/llms/azure-openai#portkey-sdk-integration-with-azure-openai) to get a virtual key.
```csharp [expandable] theme={"system"}
using OpenAI;
using OpenAI.Chat;
using System.ClientModel;
using System.ClientModel.Primitives;
public static class Portkey
{
private class HeaderPolicy : PipelinePolicy
{
private readonly Dictionary
We're introducing comprehensive usage controls for both Virtual Keys and API Keys, giving platform teams precise control over LLM access and resource consumption. This release introduces:
* **Time-based Access Control**: Create short-lived keys that automatically expire after a specified duration – perfect for temporary access needs like POCs or time-limited projects
* **Resource Consumption Limits**: Set granular limits including:
* Requests per minute (RPM) / Request per hour / Request per day
* Tokens per minute (TPM) / Tokens per hour / Tokens per day
* Budget caps based on cost incurred or tokens consumed, with periodic reset options (weekly/monthly)
**Enhanced Provider Features**
* Perplexity Integration: Full support for Perplexity API's advanced features including search domain filtering, related questions generation, and citation capabilities
**Universal Identity Management**
* **SSO Integration**: Support for all major identity providers through OIDC/SAML standards, enabling seamless enterprise authentication
* **Automated User Management**: SCIM provisioning for automatic user lifecycle management - from onboarding to role changes and offboarding
* **Granular Access Control**: Define precise access patterns and manage permissions at both user and workspace levels
* **Workspace Management API**: Programmatically manage workspaces, user invites, and access controls
**Private Deployments**
Updated documentation for fully private Portkey installations with enhanced security configurations [(*Docs*)](https://github.com/Portkey-AI/helm/tree/main/charts)
## Integrations
**New Providers**
We [won](https://www.linkedin.com/posts/1rohitagarwal_we-just-won-the-best-growth-strategy-award-activity-7272134964110868480-_mvc/?utm_source=share\&utm_medium=member_desktop) the NetApp Excellerator Award, launched [prompt.new](https://prompt.new/) for faster development, added folder organization and AI suggestions for prompt templates, introduced multi-workspace analytics.
Plus, there's now support for OpenAI's Realtime API and much more. Let's dive in!
## Summary
| Area | Key Updates |
| :----------- | :----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Platform | • See multi-workspace analytics & logs on a single dashboard
When Premera Blue Cross’ Director of Platform Engineering needed an AI Gateway, they chose Portkey. Why? Because traditional API gateways [weren’t built](https://portkey.ai/blog/ai-gateway-vs-api-gateway) for AI-first companies. Are you in the same boat? Schedule an [expert consultation here](https://calendly.com/portkey-ai/quick-consult).
***
## Platform
#### Prompt Management
* Type [prompt.new](https://prompt.new) in your browser to spin up a new prompt playground! [Try it now →](https://prompt.new)
* Organize your prompt templates with folders and subfolders:
* Use AI to write and improve your prompts - right inside the playground:
* Add custom tags/labels like `staging`, `production` to any prompt version to track changes, and call them directly:
#### Analytics
**Org-wide Executive Reports**
Monitor analytics and logs across all workspaces in your organization through a unified dashboard. This centralized view provides comprehensive insights into cost, performance, and accuracy metrics for your deployed AI applications.
* Track token usage patterns across requests & responses
* You can now filter logs and analytics with specific Portkey API keys. This is useful if you are tying a particular key to an internal user and want to see their usage!
#### Enterprise
We've strengthened our enterprise authentication capabilities with comprehensive cloud provider integrations.
* Expanded AWS authentication options, for adding your Bedrock models or Sagemaker deployments:
* IMDS-based auth (recommended for AWS environments)
* IRSA-based auth for Kubernetes workloads
* Role-based auth for non-AWS environments
* STS integration with assumed roles
* Also expanded the Azure Integration:
* Azure Entra (formerly Active Directory)
* Managed identity support
* Granular access permissions for API Keys and Virtual Keys across your organization
* Support for sending Azure `deploymentConfig` while making Virtual Keys through API. [Docs](/api-reference/admin-api/control-plane/virtual-keys/create-virtual-key)
***
#### More Customer Love
Felipe & team are building [beconfident](https://beconfident.app/), and here's what they had to say about Portkey:
> "Now that we've seen positive results, we're going to move all our prompts to Portkey."
***
## Integrations
#### Providers
|
|
| :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
**Partner blog**
[See](https://portkey.ai/blog/securing-your-ai-via-ai-gateways/) how Portkey and Pillar together can help you build secure GenAI apps for production.
### Community Contributors
A special thanks to our community contributors this month:
* [unsync](https://github.com/unsync)
* [francescov1](https://github.com/francescov1)
* [Ajay Satish](https://github.com/Ajay-Satish-01)
## Coming this month!
We're changing how agents go to production, from first principles. [Watch out for this](https://x.com/PortkeyAI/status/1912491547653701891) 👀
## Support
Join us for an upcoming webinar in partnership with Palo Alto Networks where we'll explore best practices for securing AI infrastructure at scale, implementing enterprise-grade guardrails, and maintaining compliance while enabling innovation.
Click here to [register](https://luma.com/z84zjko5).
## Platform
### List models API
You can now see a list of all models available through Portkey—along with basic details for each one. This makes it easier to discover which models you can use and compare providers and options. Read more about it [here](https://portkey.ai/docs/api-reference/inference-api/models/models).
### Unified + Portkey Batching
Optimize throughput and reduce costs with our improved batching capabilities. We've implemented unified batching across all supported providers with intelligent request grouping for optimized performance.
This reduces per-request overhead while providing automatic batch size optimization and configurable parameters to suit your specific needs.
See how you can get started [here](http://localhost:3000/product/ai-gateway/batches).
### **OTEL-based autoinstrumentation**
Portkey now works as an OTel endpoint. Send telemetry from any OTel-compatible source into Portkey and view it alongside LLM call logs. End-to-end observability for performance, cost, and compliance. Super powerful for agent workflows.
See how you can implement this [here](https://portkey.ai/docs/product/observability/opentelemetry)
This strengthens our observability even further!
### **Multiple guardrails at the workspace level**
You can now define and enforce multiple guardrail policies at the workspace-level, making it easier to secure usage, filter content, and maintain compliance across your team.
### CRUD Guardrails API
Managing guardrails at scale just got easier. You can now create, update, retrieve, delete, and list guardrails directly via API.
This makes it simple to keep policies consistent across environments, automate changes, and integrate guardrail management into your CI/CD workflows. [Learn more](https://portkey.ai/docs/api-reference/admin-api/control-plane/guardrails/create-guardrail).
## New Models and Providers
* **Featherless.ai**: Access 11,900+ open-source models with unlimited tokens through a single gateway. [Learn more →](https://portkey.ai/docs/integrations/llms/featherless)
* **GPT-OSS Models**: OpenAI's new open-weight models with strong reasoning and tool use capabilities.
* **Claude Opus 4.1**: Anthropic's latest flagship model with 74.5% performance on SWE-bench Verified.
## 🌐 Community Highlights
### **Future of AI Platforms**
Leaders from **DoorDash, Postman, Qure.ai, and Qoala** joined us for the Future of AI Platforms. It turned into a night of sharp ideas, candid discussions, and bold visions for where AI is headed.
Where should we host it next?
👉 [Send us an email](mailto:vrushank.v@portkey.ai) if you'd be interested in attending!
### Enterprise Security with Falco Vanguard
[Falco Vanguard](https://www.linkedin.com/posts/migueladelossantos_falco-cloudnativesecurity-aiengineering-activity-7358991765049122816-YHDS) is using Portkey's AI infrastructure to build an innovative security platform that intelligently clusters security events and prioritizes critical threats.
Their offline-first approach ensures data sovereignty while their integration with Portkey provides the reliability and production readiness enterprises demand. We're excited to see how this collaboration transforms security operations for organizations.
Read more about this [here →](https://www.linkedin.com/posts/migueladelossantos_falco-cloudnativesecurity-aiengineering-activity-7358991765049122816-YHDS).
## Resources
* Blog: [Simplifying LLM batch reference](https://portkey.ai/blog/simplifying-llm-batch-inference)
* Blog: [OTel traces with LLM logs for end-to-end observability agent workflows](https://portkey.ai/blog/otel-with-llm-observability-for-agents)
## Community Contributors
A special thanks to our contributor this month:
* [pnkvalavala](https://github.com/pnkvalavala)
* [indranil-kar-cloudesign](https://github.com/indranil-kar-cloudesign)
* [horochx](https://github.com/horochx)
## Coming this month!
Our MCP gateway is almost here! Reach out to us on [support@portkey.ai](mailto:support@portkey.ai) for early access!
## Support
February continues the momentum with powerful enterprise features:
* **Azure Marketplace**: Portkey is now available on Azure Marketplace for simplified enterprise procurement
* **Multiple Owners**: Organizations can now have multiple owner accounts for improved management
* **Enhanced Role Management**: Change member roles directly from the UI
* **User Key Creation**: Create user-specific keys directly from the UI interface
* **Default Configs**: Attach default configurations and metadata to any API key you create
* **Performance Optimization**: Updated cache implementation to avoid redundant Redis calls
* **Browser SDK Support**: Run our SDK directly in the browser with Cross-Origin access support
## New Models & Integrations
"Describing Portkey as merely useful would be an understatement; it's a must-have." - @AManInTech
## Our Stories
**Kicking off 2025 with major releases! 🎉**
January marks a milestone for Portkey with our first industry report — we analyzed over 2 trillion tokens flowing through Portkey to find out production patterns for LLMs.
We're also expanding our platform capabilities with advanced PII redaction, JWT authentication, comprehensive audit logs, unified files & batches API, and support for private LLMs. Latest LLMs like Deepseek R1, OpenAI o3, and Gemini thinking model are also integrated with Portkey.
Plus, we are attending the [AI Engineer Summit in New York](https://x.com/PortkeyAI/status/1886629690615747020) in February, and hosting in-person meetups in [Mumbai](https://lu.ma/bgiyw0cy) & [NYC](https://lu.ma/vmf0egzl).
Let's dive in!
## Summary
| Area | Key Updates |
| :----------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Benchmark | • Released [LLMs in Prod Report 2025](https://portkey.ai/llms-in-prod-25) analyzing 2T+ tokens
Our comprehensive analysis of 2T+ tokens processed through Portkey's Gateway reveals fascinating insights about how teams are deploying LLMs in production. Here are the key findings:
They were able to do this because of three things:
* They could build reusable prompts with our partial templates
* Our versioning let them confidently roll out changes
* And they didn't have to refactor anything thanks to our OpenAI-compatible APIs
***
## Integrations
#### Models & Providers
We are attending the [AI Engineer Summit in NYC](https://x.com/PortkeyAI/status/1886629690615747020) this February and have some extra event passes to share! Reach out to us [on Discord](https://portkey.wiki/community) to ask for a pass.
We are also hosting small meetups in NYC and Mumbai this month to meet with local engineering leaders and ML/AI platform leads. Register for them below:
Last month we hosted an inspiring AI practitioners meetup with Ojasvi Yadav and Anudeep Yegireddi to discuss the role of Event-Driven Architecture in building Multi-Agent Systems using and MCP.
[Read event report here →](https://portkey.ai/blog/event-driven-architecture-for-ai-agents)
Essential reading for your AI infrastructure:
* [LLMs in Prod Report 2025](https://portkey.ai/llms-in-prod-25): Comprehensive analysis of production LLM usage patterns
* [The Real Cost of Building an LLM Gateway](https://portkey.ai/blog/the-cost-of-building-an-llm-gateway/): Understanding infrastructure investments
* [Critical Role of Audit Logs](https://portkey.ai/blog/beyond-implementation-why-audit-logs-are-critical-for-enterprise-ai-governance/): Enterprise AI governance
* [Error Library](https://portkey.ai/error-library): New documentation covering common errors across 30+ providers
* [Deepseek on Fireworks](https://x.com/PortkeyAI/status/1885231024483033295): How to use Portkey with Fireworks to call Deepseek's R1 model for reasoning tasks
## Improvements
* Token counting is now more accurate for Anthropic streams
* Added logprobs for Vertex AI
* Improved usage object mapping for Perplexity
* Error handling is more robust across all SDKs
***
## Support
Debugging just got simpler. You can now view the original request as received by Portkey, the transformed version sent to the model provider, and same for the response in logs, making it easier for debugging.
**Privacy mode (logging off)**
Portkey now gives org owners granular control over what gets logged, from full request/response payloads to just minimal metadata.
This can be configured org-wide or per workspace, offering flexibility for sensitive or regulated use cases.[Configure it for your org](https://portkey.ai/docs/product/administration/configuring-request-logging).
**Automatic user attribution on API keys**
Every request made using a user API key now automatically carries `_user` metadata, with an option to override.
## Guardrails
**Palo Alto Networks’ AIRS Plugin**
Portkey now integrates with PANW AIRS (AI Runtime Security) to enforce guardrails that block risky prompts or model responses based on real-time security analysis. [Learn more](https://portkey.ai/docs/integrations/guardrails/palo-alto-panw-prisma)
**CRUD Guardrails endpoint**
We’ve added full support for managing guardrails via API — create, update, delete, and list guardrail rules and templates programmatically. [Learn more](https://portkey.ai/docs/api-reference/admin-api/control-plane/guardrails/create-guardrail)
## Gateway
**Support for messages route(beta)**
In addition to `chat.completions`, Portkey now supports the `messages` route (beta) for Anthropic, AWS Bedrock, and Vertex AI. This includes tool calling, thinking, multi-turn conversations, and redacted thinking support
## New models and providers
## Resources
* Cookbook: [Arize + Portkey: Multi-LLM debate with traces and evals](https://portkey.ai/docs/guides/integrations/arize-portkey)
* Blog: [How to add enterprise controls to OpenWebUI](https://portkey.ai/blog/how-to-add-enterprise-controls-to-openwebui)
* Blog: [Everything We Know About Claude Code Limits](https://portkey.ai/blog/claude-code-limits)
## Community Contributors
A special thanks to our contributor this month:
* [AG2AI-Admin](https://github.com/AG2AI-Admin)
* [Mishalabdullah](https://github.com/Mishalabdullah)
## Coming this month!
MCP Connectors on the gateway! Reach out to us on [support@portkey.ai](mailto:support@portkey.ai) for a sneakpeek!
## Support
Here’s everything that went live in June.
## Summary
| Area | Key Highlights |
| :---------------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Platform** | • Model Catalog launch
## New models and providers
service\_tier flag added
* **Prompt Security** – Secure every prompt and response in real time by embedding Prompt Security directly into Portkey’s AI Gateway. [Read more here](https://portkey.ai/blog/why-llm-security-is-non-negotiable/)
* **Lasso Security** – Combine infra-level controls and real-time behavioral monitoring to secure the entire LLM lifecycle — from access to output. [Read more here](https://portkey.ai/blog/how-to-secure-your-entire-llm-lifecycle)
* **FutureAGI** – Use Portkey as the control layer and FutureAGI as the eval layer to automate output scoring across all model traffic. [See how you can implement this](https://portkey.ai/docs/integrations/tracing-providers/future-agi)
* **Arize AI** – Connect Portkey’s routing and guardrails with Arize’s observability to monitor model drift, latency, cost, and quality in one flow. [Read more here](https://portkey.ai/docs/integrations/tracing-providers/arize)
## Portkey Live!
In partnership with [**Pangea**](https://pangea.cloud/), we hosted a live webinar on how to build scalable, secure GenAI infrastructure. Catch the replay here!.
**Improvements**
* Renamed Model Whitelist to Allowed Models for clarity and consistency
* Improved error responses for webhook failures, making them easier to debug and handle programmatically
**Teams love Portkey!**
If you love Portkey, drop a ⭐ on [GitHub](https://portkey.sh/home-git)
## Resources
* Cookbook: [Optimizing Prompts with LLama Prompt Ops](https://portkey.ai/docs/guides/prompts/llama-prompts)
* Cookbook: [OpenAI Computer Use Tool](https://portkey.ai/docs/guides/use-cases/openai-computer-use#portkey-with-openai-computer-use)
* Blog: [Building AI agent workflows with the help of an MCP gateway](https://portkey.ai/blog/building-ai-agent-workflows-with-the-help-of-an-mcp-gateway/)
* Blog: [Balancing AI model accuracy, performance, and costs](https://portkey.ai/blog/balancing-model-accuracy-performance-and-costs-with-an-ai-gateway/)
**Docs are now open for contributions!**
We’re opening up our documentation for contributions. If you’ve found parts that could be clearer, better explained, or just more complete, we’d truly appreciate your help.
Every suggestion, edit, or fix helps make Portkey better for the whole community. [See how you can contribute](https://portkey.ai/docs/README)
## Community Contributors
A special thanks to our community contributors this month:
* [Shubhwithai](https://github.com/Shubhwithai)
* [DarinVerheijke](https://github.com/DarinVerheijke)
* [jroberts2600](https://github.com/jroberts2600)
## Coming this month!
Struggling with unauthorized tool usage in MCP? Portkey is about to solve that. Stay tuned.
## Support
Our flagship release this month is the official launch of the Prompt Engineering Studio, bringing professional-grade prompt development to teams of all sizes:
* **Version control**: Track changes, compare versions, and roll back when needed
* **Collaborative workflow**: Work together with your team on prompt development
* **Variables & templates**: Create reusable prompt components and patterns
* **Testing framework**: Validate performance before production deployment
* **Production integration**: Seamlessly connect to your applications
Read about our design journey in our [detailed case study](https://portkey.ai/blog/portkey-prompt-engineering-studio-a-user-centric-design-facelift/).
**Claude Multimodal Capabilities**
You can now send images to Claude models across various providers:
* Send image URLs to Claude via Anthropic, Vertex, or Bedrock APIs
* Full support for multimodal conversations and analysis
* Consistent interface across all Claude providers
**PDF Support for Claude**
Enhance your document processing workflows with native PDF support:
* Send PDF files directly to Claude requests
* Process long-form documents without manual extraction
* Maintain formatting and structure in analysis
**Thinking Mode Expansion**
Access model reasoning across all major providers:
* Support for Anthropic (Bedrock, Vertex), OpenAI, and more
* Full compatibility with streaming responses
* Complete observability of reasoning process
* Consistent interface across all supported models
## Enterprise
**University Validation**
We're proud to announce that Portkey is being evaluated as the official AI Gateway solution by leading academic institutions:
* Harvard University
* Princeton University
* University of California, Berkeley
* Cornell University
* New York University
* Lehigh University
* Bowdoin College
Learn more about the [Internet2 NET+ AI service evaluation](https://internet2.edu/new-net-service-evaluations-for-ai-services/).
**Enhanced Security Controls**
* **AWS KMS Integration**: Bring your own encryption keys for maximum security
* **SCIM Support**: Automated user provisioning with Okta & Azure Entra (AD)
* **Organizational Controls**: Enforce guardrails and metadata requirements at the org level
* **Usage Limit Notifications**: Configure email alerts for rate/budget/usage thresholds
**Simplified Deployment**
* **CloudFormation Template**: 1-click deployment of Portkey Gateway on AWS EC2
* **Real-Time Model Pricing**: Pricing configs now fetched dynamically from control plane
* **Internal POD Communication**: Secure HTTPS between components
* **Enhanced Metrics**: Track last byte latency for streaming responses
## Gateway & Providers
**New Providers**
"Describing Portkey as merely useful would be an understatement; it's a must-have." - @AManInTech
### Community Contributors
A special thanks to our community contributors this month:
* [urbanonymous](https://github.com/urbanonymous)
* [vineye25](https://github.com/vineye25)
* [Ignacio](https://github.com/elentaure)
* [Ajay Satish](https://github.com/Ajay-Satish-01)
## Support
With Portkey’s deep integration into the Azure AI ecosystem (OpenAI, Foundry, APIM, Marketplace), teams can now build, scale, and govern GenAI apps without leaving their existing cloud setup.
Our customers are vouching for it!
## Portkey for AI Tools
You can now assign multiple labels to a single prompt version, making it easy to promote a version across environments like staging and production.
**Gateway to any API**
Portkey now supports `GET`, `PUT`, and `DELETE` HTTP methods in addition to `POST`, allowing you to route requests to any external or self-hosted provider endpoint. This means you can connect to custom APIs directly through Portkey with full observability for every call.
**OTel Integration (Analytics Data)**
You can now export Portkey analytics to any OpenTelemetry (OTel)-compatible collector, integrating easily into your existing observability stack.
**Improvements**
* Token cost tracking is now available for `gpt-image-1`.
* Ping messages are removed from streamed responses.
* Resizing metadata columns in logs
**This is what keeps us going!**
## New Models & Providers
## Resources
* Cookbook: [Optimizing Prompts with LLama Prompt Ops](https://portkey.ai/docs/guides/prompts/llama-prompts)
* Cookbook: [OpenAI Computer Use Tool](https://portkey.ai/docs/guides/llms/openai-computer-use-tool)
* Guardrail documentation is now located under “Integrations”.
* Expanded guides for agent frameworks, including CrewAI and LangGraph.
## Community Contributors
A special thanks to our community contributors this month:
* [unsync](https://github.com/unsync)
* [tomukmatthews](https://github.com/tomukmatthews)
* [jroberts2600](https://github.com/jroberts2600)
## Coming this month!
Provision and manage LLM access across your entire org from a single admin panel. Centralized controls. Granular permissions. Stay tuned.
## Support
Portkey has been recognized as one of the [2025 Gartner® Cool Vendors™ in LLM Observability](https://www.gartner.com/en/documents/7024598), in the report by Padraig Byrne, Tanmay Bisht, and Andre Bridges. This recognition reinforces our mission to help teams move from experimentation to production with confidence and observability built in.
If you like what we are building, please drop us a review here, it'd mean a lot to us!
## What industry leaders are telling about us!
## Platform
### Terraform Provider for Portkey
Portkey now has a Terraform provider for managing workspaces, users, and organization resources through the Portkey Admin API. This enables you to manage:
* Workspaces: Create and update workspaces for teams and projects
* Members: Assign users to workspaces with defined roles
* Access: Send user invites with organization and workspace access
* Users: Query and manage existing users in your organization
### Enhanced rate limits
You can now configure rate limits for both requests and tokens under each API key, giving teams precise control over workload, costs, and performance across large deployments.
* Request-based limits: Cap the number of requests per minute, hour, or day.
* Token-based limits: Cap the number of tokens consumed per minute, hour, or day.
### Configurable request timeouts
A new `REQUEST_TIMEOUT` environment variable lets you control how long the Gateway waits before timing out outbound LLM requests — helping fine-tune latency, retries, and reliability for large-scale workloads.
## Customer love!
## Guardrails
### Javelin AI Security integration
You can now configure Javelin AI Security guardrails directly in the Portkey Gateway to evaluate every model interaction for:
* Trust & Safety — detect and filter harmful or unsafe content
* Prompt Injection Detection — identify attempts to manipulate model behavior
* Language Detection — verify and filter language in user or model responses
These guardrails extend Portkey’s security layer, making it easier to enforce safe and compliant model use at scale. See how to set up Javelin guardrails [here](https://portkey.ai/docs/integrations/guardrails/javelin).
### Add Prefix guardrail
The new Add Prefix guardrail lets you automatically prepend a configurable prefix to user inputs before sending them to the model, useful for enforcing consistent tone, context, or compliance instructions across all model interactions.
### Allowed Request Types guardrail
With the Allowed Request Types guardrail, you can now define which request types (endpoints) are permitted through the Gateway. Use an allowlist or blocklist approach to control access at a granular level and restrict unwanted request types.
## Gateway
### New models and providers
Tool calls made during model interactions are now automatically detected and displayed as **separate, structured entries** in the Portkey logs.\
This makes it easier to trace the tools invoked, inspect parameters, and understand agent behavior across multi-step workflows.
### Enhanced view for web search
Web search results from Anthropic models now appear in a **clearer, formatted view** within the dashboard.\
This update improves readability and makes it easy to see what information the model used to generate its response, enabling faster analysis and better transparency.
## Customer Stories: From arm pain to AI gateway
After an unexpected setback forced Rahul Bansal to rethink how he worked, he built Dictation Daddy, an AI-powered dictation platform now used by professionals, doctors, and lawyers to write faster and with greater accuracy, powered by Portkey.
Read the full story [here](https://portkey.ai/blog/from-arm-pain-to-ai-gateway/)
## Community & Events
### MCP Gateway for Higher Education
We’re teaming up with Internet2 to host a session on how higher education institutions can implement the Model Context Protocol (MCP) securely and at scale.
[Join us](https://luma.com/4i4qspq0) to learn how MCP fits within existing campus IT frameworks, what governance models are emerging, and the practical steps universities can take to enable secure, compliant AI access across departments.
### LibreChat in Production
LibreChat makes it easy to build chat-based AI experiences but adding governance, access control, and observability is key to running it in production.
Join us to see how Portkey brings budgets, RBAC, and usage visibility to LibreChat, while connecting to 1,600+ LLMs across providers.
## Resources
* **Blog**: [Observability is now a business function for AI](https://portkey.ai/blog/observability-is-now-a-business-function-for-ai)
* **Blog**: [Using OpenAI AgentKit with Anthropic, Gemini and other providers](https://portkey.ai/blog/using-openai-agentkit-with-anthropic-gemini-and-other-providers)
* **Blog**: [What we think of the Opentelemetry semantic conventions for GenAI traces](https://portkey.ai/blog/opentelemetry-semantic-conventions-for-genai-traces)
* **Blog**: [Comparing lean LLMs: GPT-5 Nano and Claude Haiku 4.5](https://portkey.ai/blog/gpt-5-nano-vs-claude-haiku-4-5)
## Community Contributors
A special thanks to our contributors this month:[TensorNull](https://github.com/TensorNull), [miuosz](https://github.com/miuosz), [marsianin](https://github.com/marsianin), and [uc4w6c](https://github.com/uc4w6c)
## Support
This goes beyond dashboard reporting so developers can make smarter runtime decisions with accurate, provider-agnostic token counts.
## Guardrails
### Metadata-based Model Access guardrail
We introduced a new guardrail that lets you restrict model access based on metadata key-value pairs at runtime.
By evaluating metadata dynamically on every request, this guardrail provides granular, per-request governance—without requiring changes to your app logic.
### Regex-replace guardrail
Our PII guardrail already covers common fields like emails, phone numbers, and credit cards. But many customers asked for a way to redact custom org-specific patterns.
With the Regex Replace Guardrail, you can now:
* Define your own regex patterns
* Replace matches with a chosen string (e.g., \[masked\_user])
* Enforce masking rules at runtime, before data reaches the model
This is particularly useful for internal IDs, employee codes, or project references that shouldn't leave your environment. [Read more here](https://portkey.ai/docs/integrations/guardrails/regex).
## Gateway and Providers
### Unified `finish_reason` parameter
We standardized the `finish_reason` field across all providers. By default, values are mapped to OpenAI-compatible outputs, ensuring consistent handling across multi-provider deployments.
If you prefer to keep the original provider-returned value, set `x-portkey-strict-openai-compliance = false`.
### Conditional router enhancement
Conditional routing now supports **parameter-based** routing in addition to metadata. Parameter-based routing enables dynamic, per-request optimizations, giving you better performance, cost efficiency, and control over user experience. [Read more about this here](https://portkey.ai/docs/product/ai-gateway/conditional-routing#structure-of-conditions-object)
### Gateway and Providers
### New Models
[Syngenta’s](https://www.linkedin.com/posts/miragirahul_devcon2025-syngenta-hackathon-activity-7377593161826603008-X_bn) teams ran a hackathon using Portkey + n8n, building creative workflows and experimenting with how MCP servers and digital apps can transform grower experiences.
It was inspiring to see Portkey embedded directly into their innovation process, powering hands-on experimentation and ideation at scale.
### MCP Salon
We hosted some of the most illustrious MCP builders for a closed-door roundtable. The group went deep into the technical challenges of building MCP servers and clients, serving them in production, and solving real-world adoption hurdles.
👉 To stay updated on upcoming events, subscribe to our [event calendar](https://luma.com/portkey?k=c)
## Resources
* **Blog**: [MCP Message Types: Complete MCP JSON-RPC Reference Guide](https://portkey.ai/blog/mcp-message-types-complete-json-rpc-reference-guide)
* **Blog**: [Failover routing strategies for LLMs in production](https://portkey.ai/blog/failover-routing-strategies-for-llms-in-production)
* **Partnership Blog with Feedback Intelligence**: [Tracing Failures from the LLM Call to the User Experience](https://portkey.ai/blog/tracing-failures-from-the-llm-call-to-the-user-experience)
* **Blog**: [A Strategic Perspective on the MCP Registry for the Enterprise](https://portkey.ai/blog/mcp-registry)
## Community Contributors
A special thanks to our contributor this month:
* [MarcNB256](https://github.com/MarcNB256)
## Coming this month!
Webinar - **LibreChat in Production** [Register here →](https://luma.com/cywhfpko)
## Support
Great, this is setup and ready now.
The gemini model doesn't need a `system` prompt, so we can ignore it and create a prompt like this.
## 2. Write the config for a 50-50 test
To run the experiment, lets create a [config](/product/ai-gateway/configs) in Portkey that can automatically route requests between these 2 prompts.
We pulled the `id` for both these prompts from our Prompts list page and will use them in our config. This is what it finally looks like.
```json theme={"system"}
{
"strategy": {
"mode": "loadbalance"
},
"targets": [{
"prompt_id": "0db0d89c-c1f6-44bc-a976-f92e24b39a19",
"weight": 0.5
},{
"prompt_id": "pp-blog-outli-840877",
"weight": 0.5
}]
}
```
We've created a load balanced config that will route 50% of the traffic to each of the 2 prompt IDs mentioned in it. We can save this config and fetch its ID.
Create the config and fetch the ID
## 3. Make requests using this config
Lets use this config to start making requests from our application. We will use the [prompt completions API](/portkey-endpoints/prompts/prompt-completion) to make the requests and add the config in our headers.
We find that the `gpt-3.5-turbo` prompt is at 4.71 average feedback after 20 attempts, while `gemini-pro` is at 4.11. While we definitely need more data and examples, let's assume for now that we wanted to start directing more traffic to it.
We can edit the `weight` in the config to direct more traffic to `gpt-3.5-turbo`. The new config would look like this:
```json theme={"system"}
{
"strategy": {
"mode": "loadbalance"
},
"targets": [{
"prompt_id": "0db0d89c-c1f6-44bc-a976-f92e24b39a19",
"weight": 0.8
},{
"prompt_id": "pp-blog-outli-840877",
"weight": 0.2
}]
}
```
This directs 80% of the traffic to OpenAI.
And we're done! We were able to set up an effective A/B test between prompts and models without fretting.
## Next Steps
As next explorations, we could create versions of the prompts and test between them. We could also test 2 prompts on `gpt-3.5-turbo` to judge which one would perform better.
Try creating a prompt to create tweets and see which model or prompts perform better.
Portkey allows a lot of flexibility while experimenting with prompts.
## Bonus: Add a fallback
We've noticed that we hit the OpenAI rate limits at times. In that case, we can fallback to the gemini prompt so the user doesn't experience the failure.
Adjust the config like this, and your fallback is setup!
```json theme={"system"}
{
"strategy": {
"mode": "loadbalance"
},
"targets": [{
"strategy": {"mode": "fallback"},
"targets": [
{
"prompt_id": "0db0d89c-c1f6-44bc-a976-f92e24b39a19",
"weight": 0.8
}, {
"prompt_id": "pp-blog-outli-840877"
}]
},{
"prompt_id": "pp-blog-outli-840877",
"weight": 0.2
}]
}
```
If you need any help in further customizing this flow, or just have more questions as you run experiments with prompts / models, please reach out to us at [hello@portkey.ai](mailto:hello@portkey.ai) (We reply fast!)
# Function Calling
Source: https://docs.portkey.ai/docs/guides/getting-started/function-calling
Get the LLM to interact with external APIs!
As described in the [Enforcing JSON Schema cookbook](/guides/use-cases/enforcing-json-schema-with-anyscale-and-together), LLMs are now good at generating outputs that follow a specified syntax. We can combine this LLM ability with their reasoning ability to let LLMs interact with external APIs. **This is called Function (or Tool) calling.** In simple terms, function calling:
1. Informs the user when a question can be answered using an external API
2. Generates a valid request in the API's format
3. Converts the API's response to a natural language answer
Function calling is currently supported on select models on **Anyscale**, **Together AI**, **Fireworks AI**, **Google Gemini**, and **OpenAI**. Using Portkey, you can easily experiment with function calling across various providers and gain confidence to ship it to production.
**Let's understand how it works with an example**:
We want the LLM to tell what's the temperature in Delhi today. We'll use a "Weather API" to fetch the weather:
```js theme={"system"}
import Portkey from "portkey-ai";
const portkey = new Portkey({
apiKey: "PORTKEY_API_KEY",
virtualKey: "ANYSCALE_VIRTUAL_KEY",
});
// Describing what the Weather API does and expects
let tools = [
{
"type": "function",
"function": {
"name": "getWeather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state",
},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
},
"required": ["location"],
},
},
}
];
let response = await portkey.chat.completions.create({
model: "mistralai/Mixtral-8x7B-Instruct-v0.1",
messages: [
{"role": "system", "content": "You are helpful assistant."},
{"role": "user", "content": "What's the weather like in Delhi - respond in JSON"}
],
tools,
tool_choice: "auto", // auto is default, yet explicit
});
console.log(response.choices[0].finish_reason)
```
Here, we've defined what the Weather API expects for its requests in the `tool` param, and set `tool_choice` to auto. So, based on the user messages, the LLM will decide if it should do a function call to fulfill the request. Here, it will choose to do that, and we'll see the following output:
```json theme={"system"}
{
"role": "assistant",
"content": null,
"tool_calls": [
"id": "call_x8we3xx",
"type": "function",
"function": {
"name": "getWeather",
"arguments": '{\n "location": "Delhi, India",\n "format": "celsius"\n}'
}
],
}
```
We can just take the `tool_call` made by the LLM, and pass it to our `getWeather` function - it should return a proper response to our query. We then take that response and send it to our LLM to complete the loop:
```js theme={"system"}
/**
* getWeather(..) is a utility to call external weather service APIs
* Responds with: {"temperature": 20, "unit": "celsius"}
**/
let weatherData = await getWeather(JSON.parse(arguments));
let content = JSON.stringify(weatherData);
// Push assistant and tool message from earlier generated function arguments
messages.push(assistantMessage); //
messages.push({
role: "tool",
content: content,
toolCallId: "call_x8we3xx"
name: "getWeather"
});
let response = await portkey.chat.completions.create({
model: "mistralai/Mixtral-8x7B-Instruct-v0.1",
tools:tools,
messages:messages,
tool_choice: "auto",
});
```
We should see this final output:
```json theme={"system"}
{
"role": "assistant",
"content": "It's 30 degrees celsius in Delhi, India.",
}
```
## Function Calling Workflow
Recapping, there are 4 key steps to doing function calling, as illustrated below:
Function Calling Workflow
## Supporting Models
Portkey's AI Gateway provides native function calling (also known as tool calling) support across our entire ecosystem of AI providers, including OpenAI, Anthropic, Google, Together AI, Fireworks AI, and many more. If you discover a function-calling capable LLM that isn't working with Portkey, please let us know [on Discord](https://portkey.wiki/community).
### 2. Caching, Fallbacks, Load Balancing
* **Fallbacks**: Ensure your application remains functional even if a primary service fails.
* **Load Balancing**: Efficiently distribute incoming requests among multiple models.
* **Semantic Caching**: Reduce costs and latency by intelligently caching results.
Toggle these features by saving *Configs* (from the Portkey dashboard > Configs tab).
If we want to enable semantic caching + fallback from Llama2 to Mistral, your Portkey config would look like this:
```py theme={"system"}
{
"cache": { "mode": "semantic" },
"strategy": { "mode": "fallback" },
"targets": [
{
"provider": "anyscale",
"api_key": "...",
"override_params": { "model": "meta-llama/Llama-2-7b-chat-hf" }
},
{
"provider": "anyscale",
"api_key": "...",
"override_params": { "model": "mistralai/Mistral-7B-Instruct-v0.1" }
}
]
}
```
Now, just send the Config ID with `x-portkey-config` header:
```py theme={"system"}
""" OPENAI PYTHON SDK """
import openai, json
PORTKEY_GATEWAY_URL = "https://api.portkey.ai/v1"
PORTKEY_HEADERS = {
'Content-Type': 'application/json',
'x-portkey-api-key': 'PORTKEY_API_KEY',
# **************************************
'x-portkey-config': 'CONFIG_ID'
# **************************************
}
client = openai.OpenAI(base_url=PORTKEY_GATEWAY_URL, default_headers=PORTKEY_HEADERS)
response = client.chat.completions.create(
model="mistralai/Mistral-7B-Instruct-v0.1",
messages=[{"role": "user", "content": "Say this is a test"}]
)
print(response.choices[0].message.content)
```
```py theme={"system"}
""" OPENAI NODE SDK """
import OpenAI from 'openai';
const PORTKEY_GATEWAY_URL = "https://api.portkey.ai/v1"
const PORTKEY_HEADERS = {
'Content-Type': 'application/json',
'x-portkey-api-key': 'PORTKEY_API_KEY',
// **************************************
'x-portkey-config': 'CONFIG_ID'
// **************************************
}
const openai = new OpenAI({baseURL:PORTKEY_GATEWAY_URL, defaultHeaders:PORTKEY_HEADERS});
async function main() {
const chatCompletion = await openai.chat.completions.create({
messages: [{ role: 'user', content: 'Say this is a test' }],
model: 'mistralai/Mistral-7B-Instruct-v0.1',
});
console.log(chatCompletion.choices[0].message.content);
}
main();
```
```py theme={"system"}
""" REQUESTS LIBRARY """
import requests, json
PORTKEY_GATEWAY_URL = "https://api.portkey.ai/v1/chat/completions"
PORTKEY_HEADERS = {
'Content-Type': 'application/json',
'x-portkey-api-key': 'PORTKEY_API_KEY',
# **************************************
'x-portkey-config': 'CONFIG_ID'
# **************************************
}
DATA = {"messages": [{"role": "user", "content": "What happens when you mix red & yellow?"}]}
response = requests.post(PORTKEY_GATEWAY_URL, headers=PORTKEY_HEADERS, json=DATA)
print(response.text)
```
```sh theme={"system"}
""" CURL """
curl "https://api.portkey.ai/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "x-portkey-api-key: PORTKEY_API_KEY" \
-H "x-portkey-config: CONFIG_ID" \
-d '{ "messages": [{"role": "user", "content": "Say 'Test'."}] }'
```
For more on Configs and other gateway feature like Load Balancing, [check out the docs.](https://portkey.ai/docs/product/ai-gateway-streamline-llm-integrations)
### 3. Collect Feedback
Gather weighted feedback from users and improve your app:
```py theme={"system"}
""" REQUESTS LIBRARY """
import requests
import json
PORTKEY_FEEDBACK_URL = "https://api.portkey.ai/v1/feedback" # Portkey Feedback Endpoint
PORTKEY_HEADERS = {
"x-portkey-api-key": "PORTKEY_API_KEY",
"Content-Type": "application/json",
}
DATA = {
"trace_id": "anyscale_portkey_test", # On Portkey, you can append feedback to a particular Trace ID
"value": 1,
"weight": 0.5
}
response = requests.post(PORTKEY_FEEDBACK_URL, headers=PORTKEY_HEADERS, data=json.dumps(DATA))
print(response.text)
```
```sh theme={"system"}
""" CURL """
curl "https://api.portkey.ai/v1/feedback" \
-H "x-portkey-api-key: PORTKEY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"trace_id": "anyscale_portkey_test",
"value": 1,
"weight": 0.5
}'
```
### 4. Continuous Fine-Tuning
Once you start logging your requests and their feedback with Portkey, it becomes very easy to 1️) Curate & create data for fine-tuning, 2) Schedule fine-tuning jobs, and 3) Use the fine-tuned models!
Fine-tuning is currently enabled for select orgs - please request access on [Portkey Discord](https://discord.gg/sDk9JaNfK8) and we'll get back to you ASAP.
#### Conclusion
Integrating Portkey with Anyscale helps you build resilient LLM apps from the get-go. With features like semantic caching, observability, load balancing, feedback, and fallbacks, you can ensure optimal performance and continuous improvement.
[Read full Portkey docs here.](https://portkey.ai/docs/) | [Reach out to the Portkey team.](https://discord.gg/sDk9JaNfK8)
# Deepinfra
Source: https://docs.portkey.ai/docs/guides/integrations/deepinfra
[
# Groq
Source: https://docs.portkey.ai/docs/guides/integrations/groq
[
# Introduction to GPT-4o
Source: https://docs.portkey.ai/docs/guides/integrations/introduction-to-gpt-4o
> This notebook is from OpenAI [Cookbooks](https://github.com/openai/openai-cookbook/blob/main/examples/gpt4o/introduction%5Fto%5Fgpt4o.ipynb), enhanced with Portkey observability and features
## The GPT-4o Model
GPT-4o ("o" for "omni") is designed to handle a combination of text, audio, and video inputs, and can generate outputs in text, audio, and image formats.
### Current Capabilities
Currently, the API supports `{text, image}` inputs only, with `{text}` outputs, the same modalities as `gpt-4-turbo`. Additional modalities, including audio, will be **introduced soon**.
This guide will help you get started with using GPT-4o for text, image, and video understanding.
## Getting Started
### Install OpenAI SDK for Python
```py theme={"system"}
pip install --upgrade --quiet openai portkey-ai
```
### Configure the OpenAI Client
First, grab your OpenAI API key [here](https://platform.openai.com/api-keys). Now, let's start with a simple {text} input to the model for our first request. We'll use both `system` and `user` messages for our first request, and we'll receive a response from the `assistant` role.
```py theme={"system"}
from openai import OpenAI
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders
import os
## Set the API key and model name
MODEL="gpt-4o"
client = OpenAI(
api_key=os.environ.get("OPENAI_API_KEY", ""),
base_url=PORTKEY_GATEWAY_URL,
default_headers=createHeaders(
provider="openai",
api_key="PORTKEY_API_KEY" # defaults to os.environ.get("PORTKEY_API_KEY")
)
)
```
```py theme={"system"}
completion = client.chat.completions.create(
model=MODEL,
messages=[
{"role": "system", "content": "You are a helpful assistant. Help me with my math homework!"}, # <-- This is the system message that provides context to the model
{"role": "user", "content": "Hello! Could you solve 2+2?"} # <-- This is the user message for which the model will generate a response
]
)
print("Assistant: " + completion.choices[0].message.content)
```
## Image Processing
GPT-4o can directly process images and take intelligent actions based on the image. We can provide images in two formats:
1. Base64 Encoded
2. URL
Let's first view the image we'll use, then try sending this image as both Base64 and as a URL link to the API
```py theme={"system"}
from IPython.display import Image, display, Audio, Markdown
import base64
IMAGE_PATH = "data/triangle.png"
# Preview image for context
display(Image(IMAGE_PATH))
```
#### Base64 Image Processing
```py theme={"system"}
# Open the image file and encode it as a base64 string
def encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode("utf-8")
base64_image = encode_image(IMAGE_PATH)
response = client.chat.completions.create(
model=MODEL,
messages=[
{"role": "system", "content": "You are a helpful assistant that responds in Markdown. Help me with my math homework!"},
{"role": "user", "content": [
{"type": "text", "text": "What's the area of the triangle?"},
{"type": "image_url", "image_url": {
"url": f"data:image/png;base64,{base64_image}"}
}
]}
],
temperature=0.0,
)
print(response.choices[0].message.content)
```
#### URL Image Processing
```py theme={"system"}
response = client.chat.completions.create(
model=MODEL,
messages=[
{"role": "system", "content": "You are a helpful assistant that responds in Markdown. Help me with my math homework!"},
{"role": "user", "content": [
{"type": "text", "text": "What's the area of the triangle?"},
{"type": "image_url", "image_url": {
"url": "https://upload.wikimedia.org/wikipedia/commons/e/e2/The_Algebra_of_Mohammed_Ben_Musa_-_page_82b.png"}
}
]}
],
temperature=0.0,
)
print(response.choices[0].message.content)
```
## Video Processing
While it's not possible to directly send a video to the API, GPT-4o can understand videos if you sample frames and then provide them as images. It performs better at this task than GPT-4 Turbo.
Since GPT-4o in the API does not yet support audio-in (as of May 2024), we'll use a combination of GPT-4o and Whisper to process both the audio and visual for a provided video, and showcase two usecases:
1. Summarization
2. Question and Answering
### Setup for Video Processing
We'll use two python packages for video processing - opencv-python and moviepy.
These require [ffmpeg](https://ffmpeg.org/about.html), so make sure to install this beforehand. Depending on your OS, you may need to run `brew install ffmpeg` or `sudo apt install ffmpeg`
```py theme={"system"}
pip install opencv-python --quiet
pip install moviepy --quiet
```
### Process the video into two components: frames and audio
```py theme={"system"}
import cv2
from moviepy.editor import VideoFileClip
import time
import base64
# We'll be using the OpenAI DevDay Keynote Recap video. You can review the video here: https://www.youtube.com/watch?v=h02ti0Bl6zk
VIDEO_PATH = "data/keynote_recap.mp4"
```
```py theme={"system"}
def process_video(video_path, seconds_per_frame=2):
base64Frames = []
base_video_path, _ = os.path.splitext(video_path)
video = cv2.VideoCapture(video_path)
total_frames = int(video.get(cv2.CAP_PROP_FRAME_COUNT))
fps = video.get(cv2.CAP_PROP_FPS)
frames_to_skip = int(fps * seconds_per_frame)
curr_frame=0
# Loop through the video and extract frames at specified sampling rate
while curr_frame < total_frames - 1:
video.set(cv2.CAP_PROP_POS_FRAMES, curr_frame)
success, frame = video.read()
if not success:
break
_, buffer = cv2.imencode(".jpg", frame)
base64Frames.append(base64.b64encode(buffer).decode("utf-8"))
curr_frame += frames_to_skip
video.release()
# Extract audio from video
audio_path = f"{base_video_path}.mp3"
clip = VideoFileClip(video_path)
clip.audio.write_audiofile(audio_path, bitrate="32k")
clip.audio.close()
clip.close()
print(f"Extracted {len(base64Frames)} frames")
print(f"Extracted audio to {audio_path}")
return base64Frames, audio_path
# Extract 1 frame per second. You can adjust the `seconds_per_frame` parameter to change the sampling rate
base64Frames, audio_path = process_video(VIDEO_PATH, seconds_per_frame=1)
```
```py theme={"system"}
## Display the frames and audio for context
display_handle = display(None, display_id=True)
for img in base64Frames:
display_handle.update(Image(data=base64.b64decode(img.encode("utf-8")), width=600))
time.sleep(0.025)
Audio(audio_path)
```
### Example 1: Summarization
Now that we have both the video frames and the audio, let's run a few different tests to generate a video summary to compare the results of using the models with different modalities. We should expect to see that the summary generated with context from both visual and audio inputs will be the most accurate, as the model is able to use the entire context from the video.
1. Visual Summary
2. Audio Summary
3. Visual + Audio Summary
#### Visual Summary
The visual summary is generated by sending the model only the frames from the video. With just the frames, the model is likely to capture the visual aspects, but will miss any details discussed by the speaker.
```py theme={"system"}
response = client.chat.completions.create(
model=MODEL,
messages=[
{"role": "system", "content": "You are generating a video summary. Please provide a summary of the video. Respond in Markdown."},
{"role": "user", "content": [
"These are the frames from the video.",
*map(lambda x: {"type": "image_url",
"image_url": {"url": f'data:image/jpg;base64,{x}', "detail": "low"}}, base64Frames)
],
}
],
temperature=0,
)
print(response.choices[0].message.content)
```
The model is able to capture the high level aspects of the video visuals, but misses the details provided in the speech.
#### Audio Summary
The audio summary is generated by sending the model the audio transcript. With just the audio, the model is likely to bias towards the audio content, and will miss the context provided by the presentations and visuals.
`{audio}` input for GPT-4o isn't currently available but will be coming soon! For now, we use our existing `whisper-1` model to process the audio
```py theme={"system"}
# Transcribe the audio
transcription = client.audio.transcriptions.create(
model="whisper-1",
file=open(audio_path, "rb"),
)
## OPTIONAL: Uncomment the line below to print the transcription
#print("Transcript: ", transcription.text + "\n\n")
response = client.chat.completions.create(
model=MODEL,
messages=[
{"role": "system", "content":"""You are generating a transcript summary. Create a summary of the provided transcription. Respond in Markdown."""},
{"role": "user", "content": [
{"type": "text", "text": f"The audio transcription is: {transcription.text}"}
],
}
],
temperature=0,
)
print(response.choices[0].message.content)
```
The audio summary might be biased towards the content discussed during the speech, but comes out with much less structure than the video summary.
#### Audio + Visual Summary
The Audio + Visual summary is generated by sending the model both the visual and the audio from the video at once. When sending both of these, the model is expected to better summarize since it can perceive the entire video at once.
```py theme={"system"}
## Generate a summary with visual and audio
response = client.chat.completions.create(
model=MODEL,
messages=[
{"role": "system", "content":"""You are generating a video summary. Create a summary of the provided video and its transcript. Respond in Markdown"""},
{"role": "user", "content": [
"These are the frames from the video.",
*map(lambda x: {"type": "image_url",
"image_url": {"url": f'data:image/jpg;base64,{x}', "detail": "low"}}, base64Frames),
{"type": "text", "text": f"The audio transcription is: {transcription.text}"}
],
}
],
temperature=0,
)
print(response.choices[0].message.content)
```
After combining both the video and audio, you'll be able to get a much more detailed and comprehensive summary for the event which uses information from both the visual and audio elements from the video.
### Example 2: Question and Answering
For the Q\&A, we'll use the same concept as before to ask questions of our processed video while running the same 3 tests to demonstrate the benefit of combining input modalities:
1. Visual Q\&A
2. Audio Q\&A
3. Visual + Audio Q\&A
```
QUESTION = "Question: Why did Sam Altman have an example about raising windows and turning the radio on?"
```
```py theme={"system"}
qa_visual_response = client.chat.completions.create(
model=MODEL,
messages=[
{"role": "system", "content": "Use the video to answer the provided question. Respond in Markdown."},
{"role": "user", "content": [
"These are the frames from the video.",
*map(lambda x: {"type": "image_url", "image_url": {"url": f'data:image/jpg;base64,{x}', "detail": "low"}}, base64Frames),
QUESTION
],
}
],
temperature=0,
)
print("Visual QA:\n" + qa_visual_response.choices[0].message.content)
```
> ```
> Visual QA:
>
> Sam Altman used the example about raising windows and turning the radio on to demonstrate the function calling capability of GPT-4 Turbo. The example illustrated how the model can interpret and execute multiple commands in a more structured and efficient manner. The "before" and "after" comparison showed how the model can now directly call functions like `raise_windows()` and `radio_on()` based on natural language instructions, showcasing improved control and functionality.
> ```
```py theme={"system"}
qa_audio_response = client.chat.completions.create(
model=MODEL,
messages=[
{"role": "system", "content":"""Use the transcription to answer the provided question. Respond in Markdown."""},
{"role": "user", "content": f"The audio transcription is: {transcription.text}. \n\n {QUESTION}"},
],
temperature=0,
)
print("Audio QA:\n" + qa_audio_response.choices[0].message.content)
```
> ```
> Audio QA:
>
> The provided transcription does not include any mention of Sam Altman or an example about raising windows and turning the radio on. Therefore, I cannot provide an answer based on the given transcription.
> ```
```py theme={"system"}
qa_both_response = client.chat.completions.create(
model=MODEL,
messages=[
{"role": "system", "content":"""Use the video and transcription to answer the provided question."""},
{"role": "user", "content": [
"These are the frames from the video.",
*map(lambda x: {"type": "image_url",
"image_url": {"url": f'data:image/jpg;base64,{x}', "detail": "low"}}, base64Frames),
{"type": "text", "text": f"The audio transcription is: {transcription.text}"},
QUESTION
],
}
],
temperature=0,
)
print("Both QA:\n" + qa_both_response.choices[0].message.content)
```
> ```
> Both QA:
>
> Sam Altman used the example of raising windows and turning the radio on to demonstrate the improved function calling capabilities of GPT-4 Turbo. The example illustrated how the model can now handle multiple function calls more effectively and follow instructions better. In the "before" scenario, the model had to be prompted separately for each action, whereas in the "after" scenario, the model could handle both actions in a single prompt, showcasing its enhanced ability to manage and execute multiple tasks simultaneously.
> ```
Comparing the three answers, the most accurate answer is generated by using both the audio and visual from the video. Sam Altman did not discuss the raising windows or radio on during the Keynote, but referenced an improved capability for the model to execute multiple functions in a single request while the examples were shown behind him.
## Conclusion
Integrating many input modalities such as audio, visual, and textual, significantly enhances the performance of the model on a diverse range of tasks. This multimodal approach allows for more comprehensive understanding and interaction, mirroring more closely how humans perceive and process information.
# Langchain
Source: https://docs.portkey.ai/docs/guides/integrations/langchain
[
### You will need Portkey and Together AI API keys to get started
| Grab [Portkey API Key](https://app.portkey.ai/) | Grab [Together AI API Key](https://api.together.xyz/settings/api-keys) |
| ----------------------------------------------- | ---------------------------------------------------------------------- |
```json theme={"system"}
pip install -qU portkey-ai openai
```
## With OpenAI Client
```json theme={"system"}
from openai import OpenAI
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders
openai = OpenAI(
api_key= 'TOGETHER_API_KEY', ## Grab from https://api.together.xyz/
base_url=PORTKEY_GATEWAY_URL,
default_headers=createHeaders(
provider="together-ai",
api_key= 'PORTKEY_API_KEY' ## Grab from https://app.portkey.ai/
)
)
response = openai.chat.completions.create(
model="meta-llama/Llama-3-8b-chat-hf",
messages=[{"role": "user", "content": "What's a fractal?"}],
max_tokens=500
)
print(response.choices[0].message.content)
```
## With Portkey Client
You can safely store your Together API key in [Portkey](https://app.portkey.ai/) and access models using Portkey's Virtual Key
```json theme={"system"}
from portkey_ai import Portkey
portkey = Portkey(
api_key = 'PORTKEY_API_KEY', ## Grab from https://app.portkey.ai/
provider="@together-virtual-key" ## Grab from https://api.together.xyz/ and add to Portkey Virtual Keys
)
response = portkey.chat.completions.create(
model= 'meta-llama/Llama-3-8b-chat-hf',
messages= [{ "role": 'user', "content": 'Who are you?'}],
max_tokens=500
)
print(response.choices[0].message.content)
```
## Monitoring your Requests
Using Portkey you can monitor your Llama 3 requests and track tokens, cost, latency, and more.
# Mistral
Source: https://docs.portkey.ai/docs/guides/integrations/mistral
Portkey helps bring Mistral's APIs to production with its observability suite & AI Gateway.
Use the Mistral API **through** Portkey for:
1. **Enhanced Logging**: Track API usage with detailed insights and custom segmentation.
2. **Production Reliability**: Automated fallbacks, load balancing, retries, time outs, and caching.
3. **Continuous Improvement**: Collect and apply user feedback.
### 1.1 Setup & Logging
1. Obtain your [**Portkey API Key**](https://app.portkey.ai/).
2. Set `$ export PORTKEY_API_KEY=PORTKEY_API_KEY`
3. Set `$ export MISTRAL_API_KEY=MISTRAL_API_KEY`
4. `pip install portkey-ai` or `npm i portkey-ai`
```py theme={"system"}
""" OPENAI PYTHON SDK """
from portkey_ai import Portkey
portkey = Portkey(
api_key="PORTKEY_API_KEY",
# ************************************
provider="mistral-ai",
Authorization="Bearer MISTRAL_API_KEY"
# ************************************
)
response = portkey.chat.completions.create(
model="mistral-tiny",
messages = [{ "role": "user", "content": "c'est la vie" }]
)
```
```py theme={"system"}
import Portkey from 'portkey-ai';
const portkey = new Portkey({
apiKey: "PORTKEY_API_KEY",
// ***********************************
provider: "mistral-ai",
Authorization: "Bearer MISTRAL_API_KEH"
// ***********************************
})
async function main(){
const response = await portkey.chat.completions.create({
model: "mistral-tiny",
messages: [{ role: 'user', content: "c'est la vie" }]
});
}
main()
```
### 1.2. Enhanced Observability
* **Trace** requests with single id.
* **Append custom tags** for request segmenting & in-depth analysis.
Just add their relevant headers to your request:
```py theme={"system"}
from portkey_ai import Portkey
portkey = Portkey(
api_key="PORTKEY_API_KEY",
provider="mistral-ai",
Authorization="Bearer MISTRAL_API_KEY"
)
response = portkey.with_options(
# ************************************
trace_id="ux5a7",
metadata={"user": "john_doe"}
# ************************************
).chat.completions.create(
model="mistral-tiny",
messages = [{ "role": "user", "content": "c'est la vie" }]
)
```
```py theme={"system"}
import Portkey from 'portkey-ai';
const portkey = new Portkey({
apiKey: "PORTKEY_API_KEY",
provider: "mistral-ai",
Authorization: "Bearer MISTRAL_API_KEH"
})
async function main(){
const response = await portkey.chat.completions.create({
model: "mistral-tiny",
messages: [{ role: 'user', content: "c'est la vie" }]
},{
// ***********************************
traceID: "ux5a7",
metadata: {"user": "john_doe"}
});
}
main()
```
Here’s how your logs will appear on your Portkey dashboard:
### 2. Caching, Fallbacks, Load Balancing
* **Fallbacks**: Ensure your application remains functional even if a primary service fails.
* **Load Balancing**: Efficiently distribute incoming requests among multiple models.
* **Semantic Caching**: Reduce costs and latency by intelligently caching results.
Toggle these features by saving *Configs* (from the Portkey dashboard > Configs tab).
If we want to enable semantic caching + fallback from Mistral-Medium to Mistral-Tiny, your Portkey config would look like this:
```py theme={"system"}
{
"cache": {"mode": "semantic"},
"strategy": {"mode": "fallback"},
"targets": [
{
"provider": "mistral-ai", "api_key": "...",
"override_params": {"model": "mistral-medium"}
},
{
"provider": "mistral-ai", "api_key": "...",
"override_params": {"model": "mistral-tiny"}
}
]
}
```
Now, just set the Config ID while instantiating Portkey:
```py theme={"system"}
""" OPENAI PYTHON SDK """
from portkey_ai import Portkey
portkey = Portkey(
api_key="PORTKEY_API_KEY",
# ************************************
config="pp-mistral-cache-xx"
# ************************************
)
response = portkey.chat.completions.create(
model="mistral-tiny",
messages = [{ "role": "user", "content": "c'est la vie" }]
)
```
```js theme={"system"}
import Portkey from 'portkey-ai';
const portkey = new Portkey({
apiKey: "PORTKEY_API_KEY",
// ***********************************
config: "pp-mistral-cache-xx"
// ***********************************
})
async function main(){
const response = await portkey.chat.completions.create({
model: "mistral-tiny",
messages: [{ role: 'user', content: "c'est la vie" }]
});
}
main()
```
For more on Configs and other gateway feature like Load Balancing, [check out the docs.](https://portkey.ai/docs/product/ai-gateway-streamline-llm-integrations)
### 3. Collect Feedback
Gather weighted feedback from users and improve your app:
```py theme={"system"}
from portkey import Portkey
portkey = Portkey(
api_key="PORTKEY_API_KEY"
)
def send_feedback():
portkey.feedback.create(
'trace_id'= 'REQUEST_TRACE_ID',
'value'= 0 # For thumbs down
)
send_feedback()
```
```py theme={"system"}
import Portkey from 'portkey-ai';
const portkey = new Portkey({
apiKey: "PORTKEY_API_KEY"
});
const sendFeedback = async () => {
await portkey.feedback.create({
traceID: "REQUEST_TRACE_ID",
value: 1 // For thumbs up
});
}
await sendFeedback();
```
#### Conclusion
Integrating Portkey with Mistral helps you build resilient LLM apps from the get-go. With features like semantic caching, observability, load balancing, feedback, and fallbacks, you can ensure optimal performance and continuous improvement.
[Read full Portkey docs here.](https://portkey.ai/docs/) | [Reach out to the Portkey team.](https://discord.gg/sDk9JaNfK8)
# Mixtral 8x22b
Source: https://docs.portkey.ai/docs/guides/integrations/mixtral-8x22b
[
# Sync Open WebUI Feedback → Portkey
Source: https://docs.portkey.ai/docs/guides/integrations/openwebui-to-portkey
How to export thumbs-up/down from Open WebUI and ingest into Portkey using a one-file Python or Node script.
# Vercel AI
Source: https://docs.portkey.ai/docs/guides/integrations/vercel-ai
Portkey is a control panel for your Vercel AI app. It makes your LLM integrations prod-ready, reliable, fast, and cost-efficient.
Use Portkey with your Vercel app for:
1. Calling 100+ LLMs (open & closed)
2. Logging & analysing LLM usage
3. Caching responses
4. Automating fallbacks, retries, timeouts, and load balancing
5. Managing, versioning, and deploying prompts
6. Continuously improving app with user feedback
## Guide: Create a Portkey + OpenAI Chatbot
### 1. Create a NextJS app
Go ahead and create a Next.js application, and install `ai` and `portkey-ai` as dependencies.
```sh theme={"system"}
pnpm dlx create-next-app my-ai-app
cd my-ai-app
pnpm install ai @ai-sdk/openai portkey-ai
```
### 2. Add Authentication keys to `.env`
1. Login to Portkey [here](https://app.portkey.ai/)
2. To integrate OpenAI with Portkey, add your OpenAI API key to Portkey’s Virtual Keys
3. This will give you a disposable key that you can use and rotate instead of directly using the OpenAI API key
4. Grab the Virtual key & your Portkey API key and add them to `.env` file:
```sh theme={"system"}
# ".env"
PORTKEY_API_KEY="xxxxxxxxxx"
OPENAI_VIRTUAL_KEY="xxxxxxxxxx"
```
### 3. Create Route Handler
Create a Next.js Route Handler that utilizes the Edge Runtime to generate a chat completion. Stream back to Next.js.
For this example, create a route handler at `app/api/chat/route.ts` that calls GPT-4 and accepts a `POST` request with a messages array of strings:
```py theme={"system"}
// filename="app/api/chat/route.ts"
import { streamText } from 'ai';
import { createOpenAI } from '@ai-sdk/openai';
import { createHeaders, PORTKEY_GATEWAY_URL } from 'portkey-ai';
// Create a OpenAI client
const client = createOpenAI({
baseURL: PORTKEY_GATEWAY_URL,
apiKey: "xx",
headers: createHeaders({
apiKey: "PORTKEY_API_KEY",
virtualKey: "OPENAI_VIRTUAL_KEY"
}),
})
// Set the runtime to edge for best performance
export const runtime = 'edge';
export async function POST(req: Request) {
const { messages } = await req.json();
// Invoke Chat Completion
const response = await streamText({
model: client('gpt-3.5-turbo'),
messages
})
// Respond with the stream
return response.toTextStreamResponse();
}
```
Portkey follows the same signature as OpenAI SDK but extends it to work with **100+ LLMs**. Here, the chat completion call will be sent to the `gpt-3.5-turbo` model, and the response will be streamed to your Next.js app.
### 4. Switch from OpenAI to Anthropic
Portkey is powered by an [open-source, universal AI Gateway](https://github.com/portkey-ai/gateway) with which you can route to 100+ LLMs using the same, known OpenAI spec.
Let’s see how you can switch from GPT-4 to Claude-3-Opus by updating 2 lines of code (without breaking anything else).
1. Add your Anthropic API key or AWS Bedrock secrets to Portkey’s Virtual Keys
2. Update the virtual key while instantiating your Portkey client
3. Update the model name while making your `/chat/completions` call
4. Add maxTokens field inside streamText invocation (Anthropic requires this field)
Let’s see it in action:
```py theme={"system"}
const client = createOpenAI({
baseURL: PORTKEY_GATEWAY_URL,
apiKey: "xx",
headers: createHeaders({
apiKey: "PORTKEY_API_KEY",
virtualKey: "ANTHROPIC_VIRTUAL_KEY"
}),
})
// Set the runtime to edge for best performance
export const runtime = 'edge';
export async function POST(req: Request) {
const { messages } = await req.json();
// Invoke Chat Completion
const response = await streamText({
model: client('claude-3-opus-20240229'),
messages,
maxTokens: 200
})
// Respond with the stream
return response.toTextStreamResponse();
}
```
### 5. Switch to Gemini 1.5
Similarly, you can just add your [Google AI Studio API key](https://aistudio.google.com/app/) to Portkey and call Gemini 1.5:
```py theme={"system"}
const client = createOpenAI({
baseURL: PORTKEY_GATEWAY_URL,
apiKey: "xx",
headers: createHeaders({
apiKey: "PORTKEY_API_KEY",
virtualKey: "GEMINI_VIRTUAL_KEY"
}),
})
// Set the runtime to edge for best performance
export const runtime = 'edge';
export async function POST(req: Request) {
const { messages } = await req.json();
// Invoke Chat Completion
const response = await streamText({
model: client('gemini-1.5-flash'),
messages
})
// Respond with the stream
return response.toTextStreamResponse();
}
```
The same will follow for all the other providers like **Azure**, **Mistral**, **Anyscale**, **Together**, and [more](https://docs.portkey.ai/docs/provider-endpoints/supported-providers).
### 6. Wire up the UI
Let's create a Client component that will have a form to collect the prompt from the user and stream back the completion. The `useChat` hook will default use the `POST` Route Handler we created earlier (`/api/chat`). However, you can override this default value by passing an `api` prop to useChat(`{ api: '...'}`).
```py theme={"system"}
//"app/page.tsx"
'use client';
import { useChat } from 'ai/react';
export default function Chat() {
const { messages, input, handleInputChange, handleSubmit } = useChat();
return (
{messages.map((m) => (
{m.role === 'user' ? 'User: ' : 'AI: '}
{m.content}
))}
);
}
```
### 7. Log the Requests
Portkey logs all the requests you’re sending to help you debug errors, and get request-level + aggregate insights on costs, latency, errors, and more.
You can enhance the logging by tracing certain requests, passing custom metadata or user feedback.
**Segmenting Requests with Metadata**
While Creating the Client, you can pass any `{"key":"value"}` pairs inside the metadata header. Portkey segments the requests based on the metadata to give you granular insights.
```sh theme={"system"}
const client = createOpenAI({
baseURL: PORTKEY_GATEWAY_URL,
apiKey: "xx",
headers: createHeaders({
apiKey: {PORTKEY_API_KEY},
virtualKey: {GEMINI_VIRTUAL_KEY},
metadata: {
_user: 'john doe',
organization_name: 'acme',
custom_key: 'custom_value'
}
}),
})
```
Learn more about [tracing](https://portkey.ai/docs/product/observability/traces) and [feedback](https://portkey.ai/docs/product/observability/feedback).
## Guide: Handle OpenAI Failures
### 1. Solve 5xx, 4xx Errors
Portkey helps you automatically trigger a call to any other LLM/provider in case of primary failures.[Create](https://portkey.ai/docs/product/ai-gateway-streamline-llm-integrations/configs) a fallback logic with Portkey’s Gateway Config.
For example, for setting up a fallback from OpenAI to Anthropic, the Gateway Config would be:
```sh theme={"system"}
{
"strategy": { "mode": "fallback" },
"targets": [{ "virtual_key": "openai-virtual-key" }, { "virtual_key": "anthropic-virtual-key" }]
}
```
You can save this Config in Portkey app and get an associated Config ID that you can pass while instantiating your LLM client:
### 2. Apply Config to the Route Handler
```sh theme={"system"}
const client = createOpenAI({
baseURL: PORTKEY_GATEWAY_URL,
apiKey: "xx",
headers: createHeaders({
apiKey: {PORTKEY_API_KEY},
config: {CONFIG_ID}
}),
})
```
### 3. Handle Rate Limit Errors
You can loadbalance your requests against multiple LLMs or accounts and prevent any one account from hitting rate limit thresholds.
For example, to route your requests between 1 OpenAI and 2 Azure OpenAI accounts:
```sh theme={"system"}
{
"strategy": { "mode": "loadbalance" },
"targets": [
{ "virtual_key": "openai-virtual-key", "weight": 1 },
{ "virtual_key": "azure-virtual-key-1", "weight": 1 },
{ "virtual_key": "azure-virtual-key-2", "weight": 1 }
]
}
```
Save this Config in the Portkey app and pass it while instantiating the LLM Client, just like we did above.
Portkey can also trigger [automatic retries](https://portkey.ai/docs/product/ai-gateway-streamline-llm-integrations/automatic-retries), set [request timeouts](https://portkey.ai/docs/product/ai-gateway-streamline-llm-integrations/request-timeouts), and more.
## Guide: Cache Semantically Similar Requests
Portkey can save LLM costs & reduce latencies 20x by storing responses for semantically similar queries and serving them from cache.
For Q\&A use cases, cache hit rates go as high as 50%. To enable semantic caching, just set the `cache` `mode` to `semantic` in your Gateway Config:
```sh theme={"system"}
{
"cache": { "mode": "semantic" }
}
```
Same as above, you can save your cache Config in the Portkey app, and reference the Config ID while instantiating the LLM Client.
Moreover, you can set the `max-age` of the cache and force refresh a cache. See the [docs](https://portkey.ai/docs/product/ai-gateway-streamline-llm-integrations/cache-simple-and-semantic) for more information.
## Guide: Manage Prompts Separately
Storing prompt templates and instructions in code is messy. Using Portkey, you can create and manage all of your app’s prompts in a single place and directly hit our prompts API to get responses. Here’s more on [what Prompts on Portkey can do](https://portkey.ai/docs/product/prompt-library).
To create a Prompt Template,
1. From the Dashboard, Open **Prompts**
2. In the **Prompts** page, Click **Create**
3. Add your instructions, variables, and You can modify model parameters and click **Save**
### Trigger the Prompt in the Route Handler
```sh theme={"system"}
import Portkey from 'portkey-ai'
const portkey = new Portkey({
apiKey: "PORTKEY_API_KEY"
})
export async function POST(req: Request) {
const { movie } = await req.json();
const moviePromptRender = await portkey.prompts.render({
promptID: "PROMPT_ID",
variables: { "movie": movie }
})
const messages = moviePromptRender.data.messages
const response = await streamText({
model: client('gemini-1.5-flash'),
messages
})
return response.toTextStreamResponse();
}
```
See [docs](https://portkey.ai/docs/api-reference/prompts/prompt-completion) for more information.
## Talk to the Developers
If you have any questions or issues, reach out to us on [Discord here](https://portkey.ai/community). On Discord, you will also meet many other practitioners who are putting their Vercel AI + Portkey app to production.
# null
Source: https://docs.portkey.ai/docs/guides/prompts
```js theme={"system"}
[
{
"content": "You're a helpful assistant.",
"role": "system"
},
{{chat_history}}
]
```
### Step 2: Create a Variable to Store Conversation History
In the Portkey UI, set the variable type: Look for two icons next to the variable name: "T" and "\{..}". Click the "\{...}" icon to switch to **JSON** **mode**.
**Initialize the variable:** This array will store the conversation history, allowing your chatbot to maintain context. We can just initialize the variable with `[]`.
### Step 3: Implementing the Chatbot
Use Portkey's API to generate responses based on your prompt template. Here's a Python example::
```js theme={"system"}
from portkey_ai import Portkey
client = Portkey(
api_key="YOUR_PORTKEY_API_KEY" # You can also set this as an environment variable
)
def generate_response(conversation_history):
prompt_completion = client.prompts.completions.create(
prompt_id="YOUR_PROMPT_ID", # Replace with your actual prompt ID
variables={
"variable": conversation_history
}
)
return prompt_completion.choices[0].message.content
# Example usage
conversation_history = [
{
"content": "Hello, how can I assist you today?",
"role": "assistant"
},
{
"content": "What's the weather like?",
"role": "user"
}
]
response = generate_response(conversation_history)
print(response)
```
### Step 4: Append the Response
After generating a response, append it to your conversation history:
```js theme={"system"}
def append_response(conversation_history, response):
conversation_history.append({
"content": response,
"role": "assistant"
})
return conversation_history
# Continuing from the previous example
conversation_history = append_response(conversation_history, response)
```
### Step 5: Take User Input to Continue the Conversation
Implement a loop to continuously take user input and generate responses:
```python theme={"system"}
# Continue the conversation
while True:
user_input = input("You: ")
if user_input.lower() == 'exit':
break
conversation_history.append({
"content": user_input,
"role": "user"
})
response = generate_response(conversation_history)
conversation_history = append_response(conversation_history, response)
print("Bot:", response)
print("Conversation ended.")
```
### Complete Example
Here's a complete example that puts all these steps together:
```py theme={"system"}
from portkey_ai import Portkey
client = Portkey(
api_key="YOUR_PORTKEY_API_KEY"
)
def generate_response(conversation_history):
prompt_completion = client.prompts.completions.create(
prompt_id="YOUR_PROMPT_ID",
variables={
"variable": conversation_history
}
)
return prompt_completion.choices[0].message.content
def append_response(conversation_history, response):
conversation_history.append({
"content": response,
"role": "assistant"
})
return conversation_history
# Initial conversation
conversation_history = [
{
"content": "Hello, how can I assist you today?",
"role": "assistant"
}
]
# Generate and append response
response = generate_response(conversation_history)
conversation_history = append_response(conversation_history, response)
print("Bot:", response)
# Continue the conversation
while True:
user_input = input("You: ")
if user_input.lower() == 'exit':
break
conversation_history.append({
"content": user_input,
"role": "user"
})
response = generate_response(conversation_history)
conversation_history = append_response(conversation_history, response)
print("Bot:", response)
print("Conversation ended.")
```
## Conclusion
Voilà! You've successfully set up your chatbot using Portkey's prompt templates. Portkey enables you to experiment with various LLM providers. It acts as a definitive source of truth for your team, and it versions each snapshot of model parameters, allowing for easy rollback. Here's a snapshot of the Prompt Management UI. To learn more about Prompt Management [**click here**](/product/prompt-library).
# Optimizing Prompts for Customer Support using Portkey | LLama Prompt Ops Integration
Source: https://docs.portkey.ai/docs/guides/prompts/llama-prompts
Llama Prompt Ops is a Python package that automatically optimizes prompts for Llama models. It transforms prompts that work well with other LLMs into prompts that are optimized for Llama models, improving performance and reliability.
This guide shows you how to combine Llama Prompt Ops with Portkey to optimize prompts for Llama models using enterprise-grade LLM infrastructure. You'll build a system that analyzes support messages to extract urgency, sentiment, and relevant service categories.
### 2. Advanced Logs
Portkey's logging dashboard provides detailed logs for every request made to your LLMs. These logs include:
* Complete request and response tracking
* Metadata tags for filtering
* Cost attribution and much more...
### 3. Unified Access to 1600+ LLMs
You can easily switch between 1600+ LLMs. Call various LLMs such as Anthropic, Gemini, Mistral, Azure OpenAI, Google Vertex AI, AWS Bedrock, and many more by simply changing the `virtual_key` in your default `config` object.
### 4. Advanced Metadata Tracking
Using Portkey, you can add custom metadata to your LLM requests for detailed tracking and analytics. Use metadata tags to filter logs, track usage, and attribute costs across departments and teams.
Here's an example of what your company info partial might look like:
Here's an example of evaluation guidelines:
Here's what your examples partial might look like:
Here's what your main prompt template should look like:
One of Portkey's key advantages is its built-in observability. Each evaluation generates detailed traces showing execution time and token usage, input and output logs for debugging, and performance metrics across evaluations.
This visibility helps you identify performance bottlenecks, track costs as you scale, debug problematic evaluations, and compare different judge prompt versions.
## Visualizing Evaluation Results on the Portkey Dashboard
The feedback data we collect using the `portkey.feedback.create()` method automatically appears in the Portkey dashboard, allowing you to:
1. Track evaluation outcomes over time
2. Identify specific areas where your agent consistently struggles
3. Measure improvement after making changes to your AI agent
4. Share results with stakeholders through customizable reports
The dashboard gives you a bird's-eye view of your evaluation metrics, making it easy to spot trends and areas for improvement.
## Running Evaluation on Scale
### The Problem: Generic Sales Outreach Doesn't Work
Dear John,
I hope this email finds you well. I wanted to reach out about our security services that might be of interest to YMU Talent Agency.
Our company provides security personnel for events. We have many satisfied customers and would like to schedule a call to discuss how we can help you.
Let me know when you're available.
Regards,
Sales Rep
Subject: Quick security solution for YMU's talent events
Hi John,
I noticed YMU's been expanding its roster of A-list talent lately – congrats on that growth. Having worked event security for talent agencies before, I know how challenging it can be coordinating reliable security teams, especially on short notice.
We've built something I think you'll find interesting – an on-demand security platform that's already being used by several major talent agencies.
Best,
Ilya
3. **Add product offering**:
Next, we'll add a section that will receive your company's offering details from a variable:
We'll send this variable's content at runtime.
4. **Add Prospect Information Section**:
Now let's add a section that will receive the prospect information variables:
We'll send these values at runtime as well.
5. **Create Agent-Specific Sections with Conditional Logic**:
This is where the magic happens! We'll add three "conditional sections" that only appear when a specific mode is activated:
*A. Research Query Generation Mode:*
Here, we'll explain how the research query should be generated.
At this stage, we can send a request to the researcher get the research output back.
*B. Email Drafting Mode (add this section next):*
Once we have the research output, we can create the first email, and add the following to a new user role in the prompt template:
We'll take this email and send it to the evaluator, which will send back a JSON with two keys: "score" and "comment".
*C. Email Refinement Mode (add this final section):*
With the Evaluator's output, we'll now create the final email.
Evaluator is like an AI sales manager reviewing drafts before they go out - ensuring consistent quality at scale.
**Enter Portkey:** A unified, open source API for accessing over 200 LLMs. Portkey makes it a breeze to call the models on the LMSYS leaderboard - no setup required.
***
In this notebook, you'll see how Portkey streamlines LLM evaluation for the **Top 10 LMSYS Models**, giving you valuable insights into cost, performance, and accuracy metrics.
Let's dive in!
***
#### Video Guide
The notebook comes with a video guide that you can follow along
#### Setting up Portkey
To get started, install the necessary packages:
```sh theme={"system"}
!pip install -qU portkey-ai openai
```
Next, sign up for a Portkey API key at [https://app.portkey.ai/](https://app.portkey.ai/). Navigate to "Settings" -> "API Keys" and create an API key with the appropriate scope.
#### Defining the Top 10 LMSYS Models
Let's define the list of Top 10 LMSYS models and their corresponding providers.
```JS theme={"system"}
top_10_models = [
["gpt-4o-2024-05-13", "openai"],
["gemini-1.5-pro-latest", "google"],
## ["gemini-advanced-0514","google"], # This model is not available on a public API
["gpt-4-turbo-2024-04-09", "openai"],
["gpt-4-1106-preview","openai"],
["claude-3-opus-20240229", "anthropic"],
["gpt-4-0125-preview","openai"],
## ["yi-large-preview","01-ai"], # This model is not available on a public API
["gemini-1.5-flash-latest", "google"],
["gemini-1.0-pro", "google"],
["meta-llama/Llama-3-70b-chat-hf", "together"],
["claude-3-sonnet-20240229", "anthropic"],
["reka-core-20240501","reka-ai"],
["command-r-plus", "cohere"],
["gpt-4-0314", "openai"],
["glm-4","zhipu"],
## ["qwen-max-0428","qwen"] # This model is not available outside of China
]
```
#### Add Provider API Keys to Portkey Vault
ALL the providers above are integrated with Portkey - which means, you can add their API keys to Portkey vault and get a corresponding **Virtual Key** and streamline API key management.
| Provider | Link to get API Key | Payment Mode |
| ----------- | ---------------------------------------------------------------- | ---------------------------------------- |
| openai | [https://platform.openai.com/](https://platform.openai.com/) | Wallet Top Up |
| anthropic | [https://console.anthropic.com/](https://console.anthropic.com/) | Wallet Top Up |
| google | [https://aistudio.google.com/](https://aistudio.google.com/) |
```js theme={"system"}
prompt = "If 20 shirts take 5 hours to dry, how much time will 100 shirts take to dry?"
print_model_outputs(prompt)
```
#### Conclusion
With minimal setup and code modifications, Portkey enables you to streamline your LLM evaluation process and easily call 1600+ LLMs to find the best model for your specific use case.
Explore Portkey further and integrate it into your own projects. Visit the Portkey documentation at [https://docs.portkey.ai/](https://docs.portkey.ai/) for more information on how to leverage Portkey's capabilities in your workflow.
# Comparing DeepSeek Models Against OpenAI, Anthropic & More Using Portkey
Source: https://docs.portkey.ai/docs/guides/use-cases/deepseek-r1
DeepSeek R1 has emerged as a groundbreaking open-source AI model, challenging proprietary solutions with its MIT-licensed availability and state-of-the-art performance.
It has outperformed the top models by each provider in almost all the major benchmarks. But this is not the first time a new model has broken records. The most interesting part about this model is this model has Open Sourced its code and training weights with a fraction of costs of any other model.
While its Chinese origins initially raised data sovereignty concerns, major cloud providers have rapidly integrated DeepSeek R1, making it globally accessible through compliant channels.
In this guide, we will explore:
* How to access DeepSeek R1 through different providers
* Real-world performance comparisons with top models from each provider
* Implementation patterns for various use cases
All of this is made possible through Portkey's AI Gateway, which provides a unified API for accessing DeepSeek R1 across multiple providers
## Accessing DeepSeek R1 Through Multiple Providers
DeepSeek R1 is available across several major cloud providers, and with Portkey's unified API, the implementation remains consistent regardless of your chosen provider. All you need is the appropriate virtual key for your desired provider.
### Basic Implementation
```python theme={"system"}
from portkey_ai import Portkey
# Initialize Portkey client
client = Portkey(
api_key="your-portkey-api-key",
provider="@provider-virtual-key" # Just change this to switch providers
)
# Make completion call - same code for all providers
response = client.chat.completions.create(
model="deepseek-ai/DeepSeek-R1",
messages=[
{"role": "user", "content": "Your prompt here"}
]
)
```
### Available Providers and Models
#### Together AI
* `DeepSeek-R1`
* `DeepSeek R1 Distill Llama 70B`
* `DeepSeek R1 Distill Qwen 1.5B`
* `DeepSeek R1 Distill Qwen 14B`
* `DeepSeek-V3`
#### Groq
* `DeepSeek R1 Distill Llama 70B`
#### Cerebras
* `DeepSeek R1 Distill Llama 70B`
#### Fireworks
* `DeepSeek R1 671B`
#### Azure OpenAI
* `DeepSeek R1 671B`
#### AWS Bedrock
* `DeepSeek R1 671B`
### Accessing DeepSeek Models Across Providers
Portkeu provides a unified API for accessing DeepSeek models across multiple providers. All you need to do start using DeepSeek models is to
1. Get Your API Key from one of the providers mentioned above
2. Get your Portkey API key from [Portkey's Dashboard](https://app.portkey.ai)
3. Create virtual keys in [Portkey's Dashboard](https://app.portkey.ai/virtual-keys). Virtual Keys are an alias over your provider API Keys. You can set budgets limits and rate limits for each virtual key.
Here's how you can use Portkey's unified API
```python theme={"system"}
!pip install porteky-ai
```
```python theme={"system"}
client = Portkey(
api_key="your-portkey-api-key",
provider="@your-virtual-key--for-chosen-provider"
)
response = client.chat.completions.create(
model="your_chosen_model", # e.g. "deepseek-ai/DeepSeek-R1" for together-ai
messages=[
{"role": "user", "content": "Your prompt here"}
]
)
print(response.choices[0].message.content)
```
That's all you need to access DeepSeek models across different providers - the same code works everywhere.
## Comparing DeepSeek R1 Against Leading Models
We've created a comprehensive cookbook comparing DeepSeek R1 with OpenAI's o1, o3-mini, and Claude 3.5 Sonnet. This cookbook compares deepseek R1 model from `together-ai` with top models form OpenAI and Anthropic. We will be comparing the models on three different types of prompts:
1. Simple Reasoning
```python theme={"system"}
prompt = "How many times does the letter 'r' appear in the word 'strrawberrry'?"
```
2. Numerical Comparison
```python theme={"system"}
prompt2 = """Which number is bigger: 9.111 or 9.9?"""
```
3. Complex Problem Solving
```python theme={"system"}
prompt3 = """In a village of 100 people, each person knows a unique secret. They can only share information one-on-one, and only one exchange can happen per day. What is the minimum number of days needed for everyone to know all secrets? Explain your reasoning step by step."""
```
4. Coding
```python theme={"system"}
prompt4 = """Given an integer N, print N rows of inverted right half pyramid pattern. In inverted right half pattern of N rows, the first row has N number of stars, second row has (N - 1) number of stars and so on till the Nth row which has only 1 star."""
```
Here's the link to the cookbook to follow along as well as results of the comparison.
[
DeepSeek R1 has outperformed the top models from each provider in almost all major benchmarks. It has achieved 91.6% accuracy on MATH, 52.5% accuracy on AIME, and a Codeforces rating of 1450. This makes it one of the most powerful reasoning model available today.
## Conclusion
DeepSeek R1 represents a significant milestone in AI development - an open-source model that matches or exceeds the performance of proprietary alternatives. Through Portkey's unified API, developers can now access this powerful model across multiple providers while maintaining consistent implementation patterns.
Explore Portkey further and integrate it into your own projects. Visit the Portkey documentation at [https://docs.portkey.ai/](https://docs.portkey.ai/) for more information on how to leverage Portkey's capabilities in your workflow.
# Detecting Emotions with GPT-4o
Source: https://docs.portkey.ai/docs/guides/use-cases/emotions-with-gpt-4o
## First, grab the API keys
| [Portkey API Key](https://app.portkey.ai/) | [OpenAI API Key](https://platform.openai.com/api-keys) |
| ------------------------------------------ | ------------------------------------------------------ |
```sh theme={"system"}
pip install -qU portkey-ai openai
```
## Let's make a request
```py theme={"system"}
from openai import OpenAI
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders
portkey = OpenAI(
api_key = 'OPENAI_API_KEY',
base_url = PORTKEY_GATEWAY_URL,
default_headers = createHeaders(
provider = "openai",
api_key = 'PORTKEY_API_KEY'
)
)
emotions = portkey.chat.completions.create(
model = "gpt-4o",
messages = [{"role": "user","content":
[
{"type": "image_url","image_url": {"url": "https://i.insider.com/602ee9d81a89f20019a377c6?width=1136&format=jpeg"}},
{"type": "text","text": "What expression is this person expressing?"}
]
}
]
)
print(emotions.choices[0].message.content)
```
## Get Observability over the request
# Enforcing JSON Schema with Anyscale & Together
Source: https://docs.portkey.ai/docs/guides/use-cases/enforcing-json-schema-with-anyscale-and-together
Get the LLM to adhere to your JSON schema using Anyscale & Together AI's newly introduced JSON modes
LLMs excel at generating creative text, but production applications demand structured outputs for seamless integration. Instructing LLMs to only generate the output in a specified syntax can help make their behaviour a bit more predictable. JSON is the format of choice here - it is versatile enough and is widely used as a standard data exchange format.
Several LLM providers offer features that help enforce JSON outputs:
* OpenAI has a feature called [JSON mode](https://platform.openai.com/docs/guides/text-generation/json-mode) that ensures that the output is a valid JSON object.
* While this is great, it doesn't guarantee adherence to your custom JSON schemas, but only that the output IS a JSON.
* Anyscale and Together AI go further - they not only enforce that the output is in JSON but also ensure that the output follows any given JSON schema.
Using Portkey, you can easily experiment with models from Anyscale & Together AI and explore the power of their JSON modes:
**Track key metrics in real-time.** Monitor request volumes, success rates, latency percentiles, and token usage. Compare performance across providers to optimize routing.
**Analyze costs across providers.** See exactly how much each provider costs and identify optimization opportunities. Set budget alerts to prevent overspending.
**Debug issues with detailed logs.** Every request is logged with complete details including inputs, outputs, tokens, and latency. Filter logs by provider, status, or custom metadata.
***
## **Recipe 3: Fallback on Guardrail Violations**
Portkey Guardrails can block requests that violate your content policies, returning a **`446`** status code. You can use this to trigger a fallback, perhaps to a different model better suited for the filtered content.
#### **Step 1: Configure your guardrail**
1. Navigate to **Guardrails** → **Create**
2. Search for "word count" under Basic guardrails
3. Create a guardrail as shown below.
4. In the actions tab select `Deny the request if guardrail fails` flag.
5. Save the guardrail and note the `guardrail ID` for next step
#### **Step 2: Configure the Failure Scenario**
Create a saved **Portkey Config** in the UI.
1. Navigate to **Configs** in your Portkey dashboard and click **Create**.
2. Use a clear ID like **`fallback-on-guardrail-fail`**.
3. Paste the following JSON. Notice the **`input_guardrails`** block is nested inside the first target:
```json theme={"system"}
{
"strategy": {
"mode": "fallback",
"on_status_codes": [446]
},
"targets": [
{
"provider": "@openai-prod",
"override_params": { "model": "gpt-4o" },
"input_guardrails": ["your-guardrail-id"]
},
{
"provider": "@anthropic-prod",
"override_params": { "model": "claude-3-5-sonnet-20240620" }
}
]
}
```
4. **Save the Config**.
#### **Step 2: Run the Test**
This code sends an input that is intentionally too long, violating our 10-word limit. This will trigger the **`446`** error and the fallback.
```python theme={"system"}
import os
from portkey_ai import Portkey
portkey = Portkey()
long_input_message = "Hey chat!."
print(f"Sending a long input ({len(long_input_message.split())} words) to a config with a 10-20 word limit...")
try:
# Apply the saved config by its ID
chat_completion = portkey.with_options(config="fallback-on-guardrail-fail").chat.completions.create(
messages=[{"role": "user", "content": long_input_message}],
max_tokens=1024
)
print("\n✅ Success! The guardrail violation triggered a fallback.")
print(f"Final Response: {chat_completion.choices[0].message.content}")
except Exception as e:
print(f"\n❌ Failure! The fallback did not work as expected: {e}")
```
#### **Step 3: Verify the Fallback**
The request succeeds, and the response comes from Anthropic.
Go to the **Logs** page and find the request you just sent. You'll see its trace:
1. **`FAILED`**: The first attempt to `@openai-prod`, blocked by the `word_count` guardrail with a **`446`** status.
2. **`SUCCESS`**: The automatic fallback to `@anthropic-prod`, which processed the input.
### **Summary of Best Practices**
* **Test Your Configs:** Actively test your fallback logic to ensure it behaves as you expect during a real outage.
* **Be Specific with Status Codes:** Use `on_status_codes` to control precisely which errors trigger a fallback. This prevents unnecessary fallbacks on transient issues.
* **Monitor Your Logs:** The Trace View in Portkey Logs is your best tool for understanding fallback behavior, latency, and costs.
* **Consider Your Fallback Chain:** Choose fallback providers that are compatible with your use case and be mindful of their different performance and cost profiles.
# Few-Shot Prompting
Source: https://docs.portkey.ai/docs/guides/use-cases/few-shot-prompting
LLMs are highly capable of following a given structure. By providing a few examples of how the assistant should respond to a given prompt, the LLM can generate responses that closely follow the format of these examples.
Portkey enhances this capability with the ***raw prompt*** feature of prompt templates. You can easily add few-shot learning examples to your templates with *raw prompt* and dynamically update them whenever you want, without needing to modify the prompt templates!
## How does it work?
Let's consider a use case where, given a candidate profile and a job description, the LLM is expected to output candidate notes in a specific JSON format.
### This is how our raw prompt looks:
```JSON theme={"system"}
[
{
"role": "system",
"message": "You output candidate notes in JSON format when given a candidate profile and a job description.",
},
{{few_shot_examples}},
{
"role": "user",
"message": "Candidate Profile: {{profile}} \n Job Description: {{jd}}"
},
]
```
### Let's define our variables:
As you can see, we have added variables `few_shot_examples`, `profile`, and `jd` in the above examples.
```
profile = "An experienced data scientist with a PhD in Computer Science and 5 years of experience working with machine learning models in the healthcare industry."
jd = "We are seeking a seasoned data scientist with a strong background in machine learning, ideally with experience in the healthcare sector. The ideal candidate should have a PhD or Master's degree in a relevant field and a minimum of 5 years of industry experience."
```
### And now let's add some examples with the expected JSON structure:
```JSON theme={"system"}
few_shot_examples =
[
{
"role": "user",
"content": "Candidate Profile: Experienced software engineer with a background in developing scalable web applications using Python. Job Description: We’re looking for a Python developer to help us build and scale our web platform.",
},
{
"role": "assistant",
"content": "{'one-line-intro': 'Experienced Python developer with a track record of building scalable web applications.', 'move-forward': 'Yes', 'priority': 'P1', 'pros': '1. Relevant experience in Python. 2. Has built and scaled web applications. 3. Likely to fit well with the job requirements.', 'cons': 'None apparent from the provided profile.'}",
},
{
"role": "user",
"content": "Candidate Profile: Recent graduate with a degree in computer science and a focus on data analysis. Job Description: Seeking a seasoned data scientist to analyze large data sets and derive insights."
},
{
"role": "assistant",
"content": "{'one-line-intro': 'Recent computer science graduate with a focus on data analysis.', 'move-forward': 'Maybe', 'priority': 'P2', 'pros': '1. Has a strong educational background in computer science. 2. Specialized focus on data analysis.', 'cons': '1. Lack of professional experience. 2. Job requires a seasoned data scientist.' }"
}
]
```
In this configuration, `{{few_shot_examples}}` is a placeholder for the few-shot learning examples, which are dynamically provided and can be updated as needed. This allows the LLM to adapt its responses to the provided examples, facilitating versatile and context-aware outputs.
## Putting it all together in Portkey's prompt manager:
1. Go to the "Prompts" page on [https://app.portkey.ai/](https://app.portkey.ai/organisation/4e501cb0-512d-4dd3-b480-8b6af7ef4993/9eec4ebc-1c88-41a2-ae5d-ed0610d33b06/collection/17b7d29e-4318-4b4b-a45b-1d5a70ed1e8f) and **Create** a new Prompt template with your preferred AI provider.
2. Selecting Chat mode will enable the Raw Prompt feature:
1. Click on it and paste the [raw prompt code from above](/guides/use-cases/few-shot-prompting#this-is-how-our-raw-prompt-would-look). And that's it! You have your **dynamically updatable** few shot prompt template ready to deploy.
## Deploying the Prompt with Portkey
Deploying your prompt template to an API is extremely easy with Portkey. You can use our [Prompt Completions API](/portkey-endpoints/prompts/prompt-completion) to use the prompt we created.
The transformation process:
* **Original**: Contains actual PII like names, emails, SSNs
* **Final (Transformed)**: PII replaced with numbered placeholders
* **Status**: Shows if transformation occurred
## Setting Up Portkey's Native PII Detection
### Step 1: Create a PII Detection Guardrail
1. Navigate to **Guardrails** → **Create**
2. Search for "Detect PII" under PRO guardrails
3. Select PII categories to detect:
* **Phone Numbers**: Mobile and landline numbers
* **Email Addresses**: Personal and corporate emails
* **Location Information**: Addresses, cities, coordinates
* **IP Addresses**: IPv4 and IPv6 addresses
* **Social Security Numbers**: US SSN format
* **Names**: First names, last names, full names
* **Credit Card Information**: Card numbers
### Step 2: Enable PII Redaction
Toggle the **Redact PII** option to automatically replace detected PII with placeholders.
### Step 3: Configure Guardrail Actions
Set up how your guardrail should behave:
* **Async**: Run checks without blocking (default: TRUE)
* **Deny**: Block requests with PII (default: FALSE)
* **On Success/Failure**: Send feedback for monitoring
### Step 4: Add to Config and Use
Once you save your guardrail, you'll get a Guardrail ID. Add it to your config:
```json theme={"system"}
{
"input_guardrails": ["gr-pii-detection-xxx"]
}
```
1. **Original Request**: What the user sent
2. **Final (Transformed)**: What was sent to the LLM
3. **Guardrail Status**: Shows if PII detection succeeded
4. **Detected Entities**: List of all PII found
Example log entry:
```
Guardrails
✓ pii - 1 successful
No PII (when no PII detected)
```
### Understanding Response Codes
* **200**: Request successful (PII redacted if found)
* **246**: PII detected but request continued (Deny = false)
* **446**: Request blocked due to PII (Deny = true)
For support, join the [Portkey community](https://discord.gg/portkey-llms-in-prod-1143393887742861333).
# How to use OpenAI SDK with Portkey Prompt Templates
Source: https://docs.portkey.ai/docs/guides/use-cases/how-to-use-openai-sdk-with-portkey-prompt-templates
Portkeys Prompt Playground allows you to test and tinker with various hyperparameters without any external dependencies and deploy them to production seamlessly. Moreover, all team members can use the same prompt template, ensuring that everyone works from the same source of truth.
Right within OpenAI SDK along with Portkey APIs, you can use prompt templates to achieve this.
## 1. Creating a Prompt Template
Portkey's prompt playground enables you to experiment with various LLM providers. It acts as a definitive source of truth for your team, and it versions each snapshot of model parameters, allowing for easy rollback. We want to create a chat completion prompt with `gpt4` that tells a story about any user-desired topic.
To do this:
1. Go to **[www.portkey.ai](http://www.portkey.ai)**
2. Opens a Dashboard
1. Click on **Prompts** and then the **Create** button.
3. You are now on Prompt Playground.
Spend some time playing around with different prompt inputs and changing the hyperparameters. The following settings seemed most suitable and generated a story that met expectations.
The list of parameters in my prompt template:
| System | You are a very good storyteller who covers various topics for the kids. You narrate them in very intriguing and interesting ways. You tell the story in less than 3 paragraphs. |
| ----------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| User | Tell me a story about {{topic}} |
| Max Tokens | 512 |
| Temperature | 0.9 |
| Frequency Penalty | -0.2 |
When you look closely at the description for the User role, you find `{{topic}}`. Portkey treats them as dynamic variables, so a string can be passed to this prompt at runtime. This prompt is much more useful since it generates stories on any topic.
Once you are happy with the Prompt Template, hit **Save Prompt**. The Prompts page displays saved prompt templates and their corresponding prompt ID, serving as a reference point in our code.
Next up, let’s see how to use the created prompt template to generate chat completions through OpenAI SDK.
## 2. Retrieving the prompt template
Fire up your code editor and import the request client, `axios`. This will allow you to POST to the Portkey's render endpoint and retrieve prompt details that can be used with OpenAI SDK.
We will use `axios` to make a `POST` call to `/prompts/${PROMPT_ID}/render` endpoint along with headers (includes [Portkey API Key](https://portkey.ai/docs/api-reference/authentication#obtaining-your-api-key)) and body that includes the prompt variables required in the prompt template.
For more information about Render API, refer to the [docs](https://portkey.ai/docs/api-reference/prompts/render).
```js theme={"system"}
import axios from 'axios';
const PROMPT_ID = '';
const PORTKEYAI_API_KEY = '';
const url = `https://api.portkey.ai/v1/prompts/${PROMPT_ID}/render`;
const headers = {
'Content-Type': 'application/json',
'x-portkey-api-key': PORTKEYAI_API_KEY
};
const data = {
variables: { topic: 'Tom and Jerry' }
};
let {
data: { data: promptDetail }
} = await axios.post(url, data, { headers });
console.log(promptDetail);
```
We get prompt details as a JS object logged to the console:
```js theme={"system"}
{
model: 'gpt-4',
n: 1,
top_p: 1,
max_tokens: 512,
temperature: 0.9,
presence_penalty: 0,
frequency_penalty: -0.2,
messages: [
{
role: 'system',
content: 'You are a very good storyteller who covers various topics for the kids. You narrate them in very intriguing and interesting ways. You tell the story in less than 3 paragraphs.'
},
{ role: 'user', content: 'Tell me a story about Tom and Jerry' }
]
}
```
## 3. Sending requests through OpenAI SDK
This section will teach you to use the prompt details JS object we retrieved earlier and pass it as an argument to the instance of the OpenAI SDK when making the chat completions call.
Let’s import the necessary libraries and create a client instance from the OpenAI SDK.
```js theme={"system"}
import OpenAI from 'openai';
import { createHeaders, PORTKEY_GATEWAY_URL } from 'portkey-ai';
const client = new OpenAI({
apiKey: 'USES_VIRTUAL_KEY',
baseURL: PORTKEY_GATEWAY_URL,
defaultHeaders: createHeaders({
provider: 'openai',
apiKey: `${PORTKEYAI_API_KEY}`,
virtualKey: `${OPENAI_VIRTUAL_KEY}`
})
});
```
We are importing `portkey-ai` to use its utilities to change the base URL and the default headers. If you are wondering what virtual keys are, refer to [Portkey Vault documentation](https://portkey.ai/docs/product/ai-gateway-streamline-llm-integrations/virtual-keys).
The prompt details we retrieved are passed as an argument to the chat completions creation method.
```js theme={"system"}
let TomAndJerryStory = await generateStory('Tom and Jerry');
console.log(TomAndJerryStory);
async function generateStory(topic) {
const data = {
variables: { topic: String(topic) }
};
let {
data: { data: promptDetail }
} = await axios.post(url, data, { headers });
const chatCompletion = await client.chat.completions.create(promptDetail);
return chatCompletion.choices[0].message.content;
}
```
This time, run your code and see the story we set out to generate logged to the console!
```
In the heart of a bustling city, lived an eccentric cat named Tom and a witty little mouse named Jerry. Tom, always trying to catch Jerry, maneuvered himself th...(truncated)
```
## Bonus: Using Portkey SDK
The official Portkey Client SDK has a prompts completions method that is similar to chat completions’ OpenAI signature. You can invoke a prompt template just by passing arguments to `promptID` and `variables` parameters.
```py theme={"system"}
const promptCompletion = await portkey.prompts.completions.create({
promptID: 'Your Prompt ID',
variables: {
topic: 'Tom and Jerry'
}
});
```
## Conclusion
We’ve now finished writing a some NodeJS program that retrieves the prompt details from the Prompt Playground using prompt ID. Then successfully made a chat completion call using OpenAI SDK to generate a story with the desired topic.
We can use this approach to focus on improving prompt quality with all the LLMs supported, simply reference them at the code runtime.
## Best Practices
#### A. Metadata Schema Enforcement
**What is Metadata?**
Metadata in Portkey is custom key-value pair attached to every AI request. Think of it as tags that help you track who's using AI, how they're using it, and what it's costing you. This becomes crucial when you have thousands of customers making millions of requests.
For example, metadata helps you answer questions like:
* Which customer made this request?
* What feature in your app triggered it?
* Which subscription tier should be billed?
* What was the user trying to accomplish?
For each workspace, configure:
```yaml theme={"system"}
Workspace Name: workspace-enterprise
Description: Enterprise tier customers with premium access
Metadata:
tier: enterprise
default_monthly_budget: 500
support_level: premium
```
Create three workspaces (for example):
* `workspace-enterprise` - For your highest-tier customers
* `workspace-professional` - For mid-tier customers
* `workspace-starter` - For entry-level customers
## Step 3: Connecting AI Providers
### Creating Provider Integrations
Integrations securely store your provider credentials while enabling controlled access. Think of them as secure vaults that your workspaces can access without ever exposing the actual API keys.
Set up your primary provider:
1. Navigate to **Integrations** → **Create New Integration**
2. Select your AI provider (e.g., OpenAI)
3. Configure the integration:
### Workspace Provisioning
When creating the integration, configure which workspaces can access it and set appropriate budget and rate limits for your integration:
For each workspace, click the edit icon to configure:
```yaml theme={"system"}
Enterprise Tier:
Access: ✓ Enabled
Budget Limit: $500/month
Rate Limit: 1000 requests/minute
Alert Threshold: 80% ($400)
Professional Tier:
Access: ✓ Enabled
Budget Limit: $100/month
Rate Limit: 100 requests/minute
Alert Threshold: 80% ($80)
Starter Tier:
Access: ✓ Enabled
Budget Limit: $25/month
Rate Limit: 10 requests/minute
Alert Threshold: 80% ($20)
```
### Model Provisioning
Define which models your integration can access:
Configure model access strategically by tier:
```yaml theme={"system"}
- gpt-4o (Advanced reasoning)
- gpt-5 (Complex tasks)
- gpt-5-nano (Basic queries)
```
### Model Rules with Guardrails (Advanced)
For fine-grained control over model access based on metadata, use Portkey's Guardrails feature:
1. Navigate to **Guardrails** → **Model Rule Guardrail**
2. Create a new guardrail with your routing rules:
```json theme={"system"}
{
"model_rules": {
"defaults": ["gpt-3.5-turbo"],
"metadata_routing": {
"subscription_tier": {
"enterprise": ["gpt-4-turbo", "claude-3-opus"],
"professional": ["gpt-4", "claude-3-sonnet"],
"starter": ["gpt-3.5-turbo"]
}
}
}
}
```
3. Attach the guardrail at the workspace level by going to [Workspace Control](https://app.portkey.ai/workspace-control/)
4. Alternatively, attach it to individual API keys using configs
```python theme={"system"}
from portkey_ai import Portkey
def get_user_limits(workspace_slug, portkey_api_key, user_email):
"""Get rate and usage limits for a user by email"""
portkey = Portkey(api_key=portkey_api_key)
api_keys = portkey.api_keys.list(workspace_id=workspace_slug)
# Filter by user email in metadata
for key in api_keys.get('data', []):
metadata = key.get('defaults', {}).get('metadata') or key.get('metadata')
# Check if metadata contains user_email
if metadata and isinstance(metadata, dict) and metadata.get('user_email') == user_email:
print(f"User: {user_email}")
print(f"API Key: {key.get('name')}")
# Rate limits
for limit in key.get('rate_limits', []):
print(f"Rate Limit: {limit['value']} {limit['unit']}")
# Usage limits
usage = key.get('usage_limits') or {}
print(f"Usage Limit: ${usage.get('credit_limit')} {usage.get('periodic_reset')}")
print(f"Alert Threshold: {usage.get('alert_threshold')}")
return
# If no metadata match, show first available key's limits
print(f"No metadata match for {user_email}. Showing available limits:")
if api_keys.get('data'):
key = api_keys['data'][0]
for limit in key.get('rate_limits', []):
print(f"Rate Limit: {limit['value']} {limit['unit']}")
usage = key.get('usage_limits') or {}
print(f"Usage Limit: ${usage.get('credit_limit')} {usage.get('periodic_reset')}")
print(f"Alert Threshold: {usage.get('alert_threshold')}%")
# Usage
if __name__ == "__main__":
get_user_limits(
workspace_slug="your-workspace-slug",
portkey_api_key="your-portkey-admin-api-key",
user_email="your-customer-email-metadata-value" # in this example assuming your user api keys have user_email metadata value
)
# Expected output for your data:
# Rate Limit: 100 rpm
# Usage Limit: $100 monthly
# Alert Threshold: 80%
```
## Step 6: Observability and Analytics
### Accessing Analytics
Portkey provides comprehensive analytics at multiple levels. Access your analytics dashboard to monitor:
**Key Metrics to Track:**
* Total requests by customer tier
* Cost distribution across models
* Error rates and types
* Peak usage times
* Customer usage patterns
## Conclusion
You've successfully built a multi-tenant AI infrastructure that provides:
* **Individual customer control** with per-user API keys and budgets
* **Tiered access** to models based on subscription levels
* **Automatic enforcement** of spending and rate limits
* **Complete visibility** into usage patterns and costs
* **Enterprise security** with encrypted keys and audit trails
Your customers get powerful AI capabilities with transparent limits. Your business gets predictable costs and complete control. Your engineering team gets a simple, maintainable solution.
## Next Steps
Explore these related resources to get the most out of your private LLM integration:
### 2. Budget Control & Governance
Set spending limits, track costs by department, and implement rate limiting to prevent unexpected usage spikes.
### 3. Reliability Features
Add fallbacks, automatic retries, and timeouts to make your Computer Use applications more robust in production environments.
### 4. Secure API Key Management
Store your OpenAI API keys securely using Portkey's Virtual Keys instead of exposing them directly in your code.
## Next Steps
For more details on setting up Portkey for enterprise AI deployments, see these resources:
This gives you an instant breakdown of LLM expenses per user or team.
***
## Step 3: Fetch Cost Data Programmatically via the Analytics API
For deeper integrations, the **Analytics API** enables real-time cost tracking inside your own application. This is useful for:
* **User-facing billing dashboards** (show users their LLM usage in real time)
* **Automated cost monitoring** (trigger alerts when a user’s spending exceeds a threshold)
* **Enterprise reporting** (export data for budget forecasting)
**Understanding the Analytics API**
The API provides comprehensive cost analytics data across any metadata dimension you've configured. You can query historical data, aggregate costs across different timeframes, and access detailed metrics for each metadata value.
Here's what you can access through the API:
```json theme={"system"}
"data": [
{
"metadata_value": "kreacher",
"requests": 4344632,
"cost": 3887.3066999996863,
"avg_tokens": 447.3689256075083,
"avg_weighted_feedback": 4.2,
"requests_with_feedback": 10,
"last_seen": "2025-02-03T07:19:30.000Z",
"object": "analytics-group"
},
{
...more such objects
}
```
These metrics provide insights into costs, usage patterns, and efficiency. The response includes:
* Total requests and costs per metadata value
* Average token usage for optimization analysis
* User feedback metrics for quality assessment
* Timestamp data for temporal analysis
## Step 4: Tracking User Costs using Portkey's Analytics API
Before making your first API call, you'll need to obtain an API key from the Portkey Dashboard. This key requires analytics scope access, which you can configure in your API key settings.