Enterprise Gateway - Portkey Docs

Schedule Call

Discuss how Portkey’s AI Gateway can enhance your organization’s AI infrastructure

1.11.6

2025-06-21

v1.11.6

Provider Updates

Groq: Added support for service_tier parameter in the Groq provider configuration
Anthropic: Added support for Anthropic’s prompt caching for tool results and tool use
Anthropic: Fixed multi turn tool calling when arguments to the tool call is empty

Improvements and Fixes

Fixed an issue with Auth enabled Aws Redis Cache with Password and cluster mode
Handled Webhook Guardrail errors and return verdict with the correct status and error

1.11.5

2025-06-18

v1.11.5

Guardrails

Added support for metadata keys plugin to enforce metadata keys from the request.

1.11.4

2025-06-17

v1.11.4

Provider Updates

Bedrock: Added support for AssumedRole for bedrock application inference profiles
Bedrock Multimodal Embeddings: Added support for multimodal embeddings for providers cohere and titan.
Azure Foundry: Added support for createTranscription,createTranslation, imageGeneration, batch and files endpoints.
Anthropic: Added Support for computer use tool.
Anthropic: Added support for file_url and mime_type for file content parts in Anthropic requests.
VertexAI: Added support for Gemini/Vertex Thinking mode.

Cache Improvements

Added support for Azure Redis with auth modes EntraID and ManagedIdentity

Fixes And Improvements

Improvements for Redis Cache
- Added support for separate username and password for Redis Cache. Use REDIS_USERNAME and REDIS_PASSWORD environment variables.
- Added support for Azure Redis Cache. Use CACHE_STORE with azure-redis as value.
- Added support for Managed Identity for Azure Managed Redis.
  - You can pass AZURE_REDIS_AUTH_MODE and AZURE_REDIS_MANAGED_CLIENT_ID for a different auth setup.
  - Defaults to AZURE_AUTH_MODE and AZURE_MANAGED_CLIENT_ID if not provided
- Added support for Entra ID for Azure Redis Cache.
  - You can pass AZURE_REDIS_AUTH_MODE and AZURE_REDIS_ENTRA_CLIENT_ID, AZURE_REDIS_ENTRA_CLIENT_SECRET, AZURE_REDIS_ENTRA_TENANT_ID for a different auth setup.
  - Defaults to AZURE_AUTH_MODE and AZURE_ENTRA_CLIENT_ID, AZURE_ENTRA_CLIENT_SECRET, AZURE_ENTRA_TENANT_ID if not provided
HTTPS Proxy
- Added HTTPS Proxy support for all the external calls.
- Pass HTTPS_PROXY environment variable to enable this feature.
Added support for virtual key inclusion for custom log if passed in headers.
Fixed issue with proxy calls not working with configs for some providers.

1.11.3

2025-06-06

v1.11.3

Observability

Prometheus Metrics are migrated to use endpoints instead of path for all the metrics

Fixes And Improvements

Added a global error handler for all the unhandled exceptions to prevent server crashes.
Updated JWT Plugin to validate iat field

1.11.2

2025-06-03

v1.11.2

Fixes And Improvements

Fixed IRSA Web Identity token handling issue for Log Store that was introduced in v1.11.1

1.11.1

2025-06-02

v1.11.1

Provider Updates

OpenAI: Added support for background and service_tier parameters
Azure: Added support for custom hosts and private links for Azure Plugins
Bedrock: Added native support for inference profiles Ref

OTEL Traces Collector endpoints

Added new endpoint /v1/otel/v1/traces to collect any OTEL traces as Portkey traces

Log Exports available on Data Plane

Log exports are now available on Data Plane
Export logs without them being sent to Control Plane Docs Link
Please note that, Dataservice is required for log exports to work via Data Plane.

Fixes And Improvements

Fixed cache bug where Bedrock and Vertex requests were getting responses from the wrong models
Added support for fetching environment variables from mounted paths

1.11.0

2025-05-17

v1.11.0

Provider Updates

Bedrock: Fixed cache token calculation for streaming requests

Fixes And Improvements

Added mTLS support for Gateway to internal services in air-gapped deployments

1.10.23

2025-05-15

v1.10.23

Provider Updates

Vertex: Added support for dimensions parameter in multi-modal embeddings

Plugins

JWT Plugin: Added JWT authentication for runtime validation

Fixes And Improvements

Added pricing support for gpt-image-1 model

1.10.22

2025-05-08

v1.10.22

Enhanced Anthropic PDF Support

Added transformation logic to support OpenAI-spec-compatible file content parts in request.
Introduced two new Portkey parameters for the file content parts: file_url and mime_type.
Applicable for Anthropic, Bedrock-Anthropic and VertexAI-Anthropic models.
Docs

OTEL Configurations

Added two new environment variables:
- OTEL_SERVICE_NAME: Sets the service.name resource attribute value.
- OTEL_RESOURCE_ATTRIBUTES: Comma-separated key=value pairs which will be sent as individual resource attributes.

New Providers

Ncompass: Supports chat completions endpoint.
Lepton: Supports chat completions, completions and transcriptions endpoints.
Snowflake Cortex: Supports chat completions endpoint.

Provider Updates

Groq: Handled an exception that occurred when stream_options was included in the request, because the response transformer was not handling usage chunk mapping as expected.
Workers AI: Added support for /images/generations route.
Openrouter: Added support for usage request parameter and response mapping.
Deepinfra: Handled ping event returned in stream and mapped usage field returned in the response.
VertexAI: Now returns 400 (instead of 500) for empty model validation errors.

Fixes And Improvements

For providers except OpenAI and Azure-OpenAI, updated the value of object field in chat completions response from chat_completion to chat.completion for OpenAI spec compliance.
Proxy (Passthrough) Requests: Fixed endpoint construction logic, which was affecting a few provider-route combinations.
Unified Batches: Fixed issue where embeddings batch output was being returned as empty.
Prometheus: Fixed issue where model label was set as N/A for Bedrock requests.

1.10.21

2025-04-30

v1.10.21

AWS Bedrock Prompt Caching

Added support for AWS Bedrock prompt caching.
Docs Link

VertexAI Gemini 2.5 Thinking Param Support

Added support for the thinking settings parameters for VertexAI.

General Purpose File Upload For VertexAI

Documentation coming soon.

Provider Updates

Azure-OpenAI and Azure Foundry: Added cost calculation support for OpenAI finetuned models.
Bedrock:
- Handled tool role messages with empty content to avoid validation errors.
- Added response_format support for Deepseek partner models
VertexAI:
- Handled the unsupported $schema property in tools properties JSON Schema.
- Inference support for fine-tuned Gemini models.
Groq: Added support for translations, transcriptions and speech endpoints.

Fixes And Improvements

Fixed blocklist handling for Azure Content Safety guardrail.
Fixed Fireworks dataset upload validation error.

1.10.20

2025-04-23

v1.10.20

OpenAI Embeddings Latency Improvements

Improved response handling for OpenAI Embeddings resulting in significant reduction in response processing latency.

Strict Metadata Enforcement

Updated the preference for metadata logging. The new order is Workspace Default Metadata > API Key Default Metadata > Incoming Request Metadata.
This provides better control to organisation and workspace admins. Values set by admins cannot be overridden by request level metadata fields.

Strict Default Config Enforcement

Added support to disable default config override for API keys. If config override is not allowed and user tries to send a new config in request as well, Gateway will throw a 400 error.

Provider Updates

VertexAI: If a batch record failed on the provider’s end, the error will be retained in the final batch output file.
AzureOpenAI: Fixed URL path construction logic for non-completions requests like batches and files where an extra /v1 was getting added in the final URL, causing request failures.

Fixes And Improvements

Fixed an edge case where Batches, Files and Fine-tune endpoint threw an error when the passed config had targets field with a single virtual key in it.

1.10.19

2025-04-18

v1.10.19

OTEL Metrics Push

Added support for pushing Portkey Clickhouse analytics (traces and spans) to OTEL collector.
The following environment variables will be used to configure OTEL collector:
- OTEL_PUSH_ENABLED
- OTEL_ENDPOINT

Milvus Vector Store for Semantic Caching

Added support for Milvus vector store for semantic caching.
The following Vector stores are now supported:
- pinecone
- milvus
The following environment variables will be used to configure the Milvus vector store:
- VECTOR_STORE
- VECTOR_STORE_ADDRESS
- VECTOR_STORE_API_KEY
- VECTOR_STORE_COLLECTION_NAME

Azure Guardrails Support

Added support for Azure Content Safety API
Added support for PII detection with Azure Language Service

Prompts Render Endpoint

Prompts Render endpoint is now a part of the Gateway. It is available at /v1/prompts/:id/render.

Provider Updates

Vertex AI: Added support for dimensions for embeddings

Minor Enhancements

Prometheus Metric: Added portkey_processing_time_excluding_last_byte_ms metric which provides Portkey processing time excluding the LLM last byte diff latency (llm_last_byte_diff_duration_milliseconds).

1.10.18

2025-04-16

v1.10.18 (Redacted)

Redaction notice

This release introduced a critical bug in budget enforcement. We are redacting this release and will be releasing a patch with out Workspace Budget and related changes.

Workspace Level Usage and Rate Limits

Organisations can now enforce usage limits for each workspace
Organisations can now enforce rate limits for each workspace

OTEL Metrics Push

Added support for pushing Portkey Clickhouse analytics (traces and spans) to OTEL collector.
The following environment variables will be used to configure OTEL collector:
- OTEL_PUSH_ENABLED
- OTEL_ENDPOINT

Milvus Vector Store for Semantic Caching

Added support for Milvus vector store for semantic caching.
The following Vector stores are now supported:
- pinecone
- milvus
The following environment variables will be used to configure the Milvus vector store:
- VECTOR_STORE
- VECTOR_STORE_ADDRESS
- VECTOR_STORE_API_KEY
- VECTOR_STORE_COLLECTION_NAME

Azure Guardrails Support

Added support for Azure Content Safety API
Added support for PII detection with Azure Language Service

Provider Updates

Vertex AI: Added support for dimensions for embeddings

1.10.17

2025-04-11

v1.10.17

OpenAI and Azure OpenAI Response API

Added end-to-end support for the Response API.
Implemented caching for stream requests.
Introduced cost calculation for tools like web_search, file_search and code_execution.

Azure AI Foundry Enhancements

Updated the existing Azure Inference integration to directly accept endpoints from the Azure Foundry dashboard.

Retry Enhancements

Introduced a new retry setting use_retry_after_header. When set to true, if the provider returns the x-retry-after or x-retry-after-ms headers, Gateway will use these headers for retry wait times instead of applying the default exponential backoff for 429 responses.

Configurable Default Cache TTL

Default max cache TTL can now be set at the organisation level.

Provider Updates

Azure OpenAI: Added support for logprobs and top_logprobs request parameters.
Perplexity: Added support for response_format and search_recency_filter request parameters.
AWS Bedrock: Handled empty assistant tools messages containing only newline characters (\n\n).

Improvements

Gateway will now populate the model field in responses for the /chat/completions API if the providers do not natively return this field, ensuring alignment with the OpenAI signature.

1.10.16

2025-04-09

v1.10.16

Improvements

File Upload:
- Handle file upload failures for Bedrock in some scenarios
Unified Batch API:
- Return error_file_id content in the batch output for failed file uploads for OpenAI and Azure OpenAI providers.

1.10.15

2025-04-02

v1.10.15

Improvements

File Upload:
- Support for uploading large files to Providers and Data Service
Allow users to pass custom mime-types in the request body. For example:

    {
    "model": "gemini-1.5-pro",
    "messages": [
        {
            "role": "system",
            "content": "You are a helpful assistant!"
        },
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "What's in this image?"
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "<image-url>",
                        "mime_type": "image/jpeg"
                    }
                }
            ]
        }
    ]
    }

1.10.14

2025-03-28

v1.10.14

Enforce Organisation And Workspace Guardrails

It is now possible to enforce guardrails at organisation and workspace levels, which will be applied to all requests.
Documentation: Workspace-Level Guardrails, Organisation-Level Guardrails

Unified Finetuning APIs for Fireworks

Extended the existing unified finetuning APIs to support Fireworks provider.

Pricing Updates

Added support for calculating Perplexity search cost and Gemini grounding cost.

Updated Unified API Signature For Anthropic Extended Thinking

Updated the unified API signature for Extended thinking which was introduced in v1.10.12 to ensure that OpenAI compliant field of the response remain untouched regardless of strict_open_ai_compliance flag.
More Details:

Unified Batches API Improvements

custom_id will be preserved in the VertexAI batch output.
Fixed some issues with batches cost calculation.

Logging Updates

Non-OpenAI compliant fields like groundingMetadata (Gemini Grounding), citations (Perplexity Search) and extended thinking response will now be logged for stream responses. Previously, these fields were not logged specifically for streaming response.

Provider Updates

Fireworks: Added support for logprobs and top_logprobs parameters.

Fixes and Improvements

Added new environment variable (AWS_ENDPOINT_DOMAIN) which can be used to override the default value (amazonaws.com)
Fixed an edge case where before_request_hook failures were not getting flagged with 246 response status code for cached and non-cached stream responses.

1.10.13

2025-03-20

v1.10.13

Unified Batches APIs for VertexAI Embedding

Added support for batch processing of embeddings with Vertex AI.

Provider Updates

AWS Bedrock
- Multi-Turn Conversation With Tools:
  - Handled assistant messages where content is set as null and tool_calls are passed.
OpenAI
- Fixed an edge case (introduced in the previous version) which was causing issues in cost calculation of fine-tuned models.

Fixes and Improvements

Fixed batch pricing calculation issue for VertexAI and Anthropic Bedrock models.
Fixed an edge case where the x-portkey-retry-attempt-count response header was set to -1 even when no retries were configured.
Improved handling to skip stream mode detection for irrelevant request types. For example: stream mode detection should not happen for any GET requests as it is not supported.
Removed redundant AWS credential fetch failures at boot time.

1.10.12

2025-03-18

v1.10.12

Real-Time Model Pricing Sync

Model pricing configs are no longer coupled with gateway builds.
For hybrid deployments, model pricing configs will be fetched from the control plane.

Unified API Signature For Anthropic Thinking

Introduced a unified API signature to support single-turn and multi-turn conversations with Anthropic Extended Reasoning across Anthropic, AWS Bedrock and VertexAI.
More Details:

Prometheus Metric Updates

Added a new metric (llm_last_byte_diff_duration_milliseconds) to track LLM last byte latency for chunked JSON responses.
Added a new label (stream) for all metrics. Possible values: 0/1

Guardrails Updates

AWS Bedrock: Added handling to flag regex patterns returned by the guardrail.

Provider Updates

Azure OpenAI: Mapped the correct model name from multi-deployment virtual keys.

Fixes and Improvements

Portkey 500s are now logged in the console for debugging.

Internal POD to POD HTTPS Support

Added support for internal POD to POD HTTPS communication.
This can be enabled by mounting a volume with certificate and key.
TLS_KEY_PATH and TLS_CERT_PATH environment variables will be used to fetch the certificate and key from the volume.

1.10.11

2025-03-06

v1.10.11

Provider Updates

AWS Bedrock
- Added support for encryption key usage when uploading files to S3.
VertexAI:
- Minor updates to streamline the unified spec for batches and fine-tune APIs.
- Updated pricing for gemini-2.0-flash-lite models.
- Added support for webm mimeType.
Openrouter
- Mapped the usage object for streaming responses.
Azure Inference
- Replaced extra-parameters: ignore with extra-parameters: drop due to deprecation by Azure.
OpenAI and Azure OpenAI
- Update pricing for GPT 4.5 models

1.10.10

2025-02-27

v1.10.10

Unified Finetuning APIs for VertexAI

Extended the existing unified finetuning APIs to support VertexAI.
The File-upload and transformations will be done according to the provider requirements.

Body Params Support in Conditional Router

Added support for using params to specify body fields in conditional router queries. Previously, only metadata-based routing was supported.

Streaming Cache Responses Optimization

Increased stream chunk content size from 1 token to 125 tokens for cached responses. This reduces the number of chunks significantly (e.g., 2000 tokens now stream in ~16 chunks instead of 2000 chunks).
Improved last chunk delivery time.
In addition to latency improvements, this update reduces unnecessary network overhead caused by the large number of chunks.

AWS IRSA-based Authentication Updates

Switched from the default global STS endpoint to regional STS endpoints (for Bedrock and S3 requests) to ensure proper token generation when the global STS is unavailable from the instance.

Provider Updates

Anthropic:
- Better error handling for error type stream chunks returned by the provider.
- Pricing updates for Claude 3.7 models across Anthropic, Bedrock and VertexAI.

1.10.9

2025-02-20

v1.10.9

Redis Cache Optimization

Updated cache implementation to avoid redundant Redis calls to improve overall performance.

VertexAI Service Account Token Caching

Implemented caching for Vertex service account token. Previously, tokens were being regenerated on every request despite having 1-hour validity.
This will reduce VertexAI request latency by 50-100ms per request.

Provider Updates

Google and VertexAI
- Handled tool call response parsing when there is one part tool call and one part text.
- Made the default/empty usage object compliant with OpenAI for streaming response.

1.10.8

2025-02-13

v1.10.8

Mutator Webhooks

The existing webhook plugin now has mutation capability.
This can be used for use-cases like BYO-PII redaction guardrail.

Configurable Timeouts for Guardrails

It is now possible to set timeout values for Guardrail execution. The current default value is 5 seconds.
timeout parameter can be used for all the guardrails that make a fetch call internally.
It is also possible to store this timeout value in control plane while creating/updating a Guardrail on UI.

Provider Updates

AzureOpenAI: Added support for stream_options parameter.

1.10.7

2025-02-10

v1.10.7

Fixes and Enhancements

Fix: Allow empty body in POST and PUT requests. Gateway was adding empty object as a default body for POST and PUT requests. This caused issues for APIs like POST assistants cancel or POST batches cancel where the upstream provider does not accept body at all.

1.10.6

2025-02-07

v1.10.6

Unified Batches APIs for AzureOpenAI

Extended the unified batches APIs to support AzureOpenAI batching.

Provider Updates

Deepseek Models: Added support for Deepseek models across multiple inference providers like Fireworks, Groq and Together.

Fixes and Enhancements

Chore: Allow budget exhausted user API keys to view logs. Control plane uses user API keys to fetch UI logs from the Gateway. Budget exhaustion of these keys should not have blocked logs view.

1.10.5

2025-02-06

v1.10.5

JWT Auth

Added support for JWT based authentication and authorization.
Customers can configure their JWKS endpoint or the JWKS JSON.

Unified Batches APIs for VertexAI

Extended the unified batches APIs to support VertexAI batching.

Provider Updates

Google and VertexAI: Updated the Grounding implementation to support their new API signatures. Docs Link
AWS Bedrock: Handle edge cases for AWS Bedrock file uploads.

Fixes and Enhancements

Logging: Added exception details like cause and name in logs for provider level fetch failures.
Caching: Enabled caching even when the debug flag is set to false.

1.10.4

2025-01-29

v1.10.4

PII Redaction Guardrails

Added PII Redaction Guardrails through multiple guardrail providers:
- Portkey Managed
- AWS Bedrock
- Pangea
- Patronus
- Promptfoo
If any entities were redacted from request/response, the guardrail result object in the final response will contain a flag named transformed set to true.

Request Metadata Logging Updates

Workspace metadata will now logged on individual request level.

New Providers

Replicate: Now supported for proxy (passthrough) requests.

Fixes and Enhancements

Guardrails: Added ability to override default guardrail credentials (stored in control plane) with custom credentials at runtime.

1.10.3

2025-01-23

v1.10.3

AzureOpenAI Unified Finetuning Support

Extended the unified finetuning APIs to support AzureOpenAI provider.

AWS Bedrock Guardrails

AWS Bedrock Guardrails are now supported for request/response checks.
Here is a short document which can be used to setup this with Portkey.

Virtual Keys for Custom Models/Providers

It is now possible to configure custom host and custom headers directly in the virtual keys.
If your custom model’s API signature matches any of our existing providers, you can create a virtual key with your custom settings.
While this functionality was already available, it has now been integrated directly into virtual keys for more streamlined configuration.

Prometheus Metric Updates

Updated the units for LLM request duration histogram metrics to milliseconds. The label has been renamed from llm_request_duration_seconds to llm_request_duration_milliseconds
Added a new metric named portkey_request_duration_milliseconds to track Portkey’s processing latency.

New Providers

Milvus DB: Supported as a passthrough provider.

Provider Updates

VertexAI and Google Improvements
- Added logprobs support compatible with OpenAI format via logprobs and top_logprobs parameters
- Added support for experimental Gemini Thinking Models.
- Added tool parameters JSON schema handling to ignore/skip fields which are not compatible with these 2 providers.
Anthropic: Added total_tokens in stream response to make it compliant with OpenAI spec.

1.10.2

2025-01-14

v1.10.2

Provider Updates

VertexAI: VertexAI requests that sent the virtual key and config headers separately were failing with a provider 401 error. This was happening specifically for VertexAI requests where the virtual key was sent as a separate header along with a config header.

1.10.1

2025-01-13

v1.10.1

Unified Finetune APIs

Added unified finetune APIs for OpenAI, AzureOpenAI, Bedrock and Fireworks.

Fixes and Enhancements

Code Detection Guardrail Updates: Added checks for verbose identifiers to detect python and js markdown code blocks. Example: check for python and javascript along with py and js identifiers.

1.10.0

2025-01-03

v1.10.0

Unified Batches and Files API

Added unified batching APIs for OpenAI, AWS Bedrock and Cohere
Docs Link

Improved Batch Management for Analytics Data Inserts

Improved Clickhouse batch management to prevent log drops.
Notable reduction in memory usage growth and spikes compared to previous builds.
We also recommend changing ANALYTICS_STORE env to control_plane (for hybrid deployments) so that batching/retries can managed by Portkey.

Gateway Docker Image Size Reduction:

Made some updates to the image build process, reducing the size (compressed) from ~275MB to ~75MB.

VertexAI Self-Deployed Models (a.k.a Endpoints in Vertex):

You can now use self-deployed models from VertexAI. This update also supports Vertex-Huggingface models.

Shorthand Format For Guardrails In Config:

Added input_guardrails and output_guardrails fields in config which accept array of guardrail slugs.

Guardrail Output Explanation

Guardrails responses now include an explanation property to clarify why checks passed or failed.
This property is currently only available for default checks.

OpenAI `developer` Role Support Across All Providers:

For OpenAI and AzureOpenAI, the role will be mapped as expected.
For other providers, the developer role is mapped to the system role (or its equivalent).

New Partner Guardrails

Mistral (mistral.moderateContent): Guard against different type of contents like hate_and_discrimination, violence_and_threats, etc.
Pangea (pange.textGuard): Guard against malicious content and other undesirable data.

Provider Updates

Cohere: Removed unsupported stream parameter from the Bedrock-Cohere integration.

Fixes and Enhancements

Image Cost Calculation: Updated the image calculation logic to handle different quality, size, etc. combinations.
ValidURL Guardrail: Updated the URL extraction logic to handle more edge cases.
Prompt Render Error Message: Prompt render API (/render) is a control plane API. Added detailed message to highlight this in case a user tries to use this API on their deployed Gateway.

1.9.5

2024-12-17

v1.9.5

Gemini Grounding Mode Support

Added Gemini grounding mode support in OpenAI compatible tools format.
Docs Link

Provider Updates

Groq: Fixed finish_reason mapping for streaming response.
AWS Bedrock: fixed the index mapping for tool call streaming response.
VertexAI: fixed final model param mapping for VertexAI Meta partner models.

Fixes and Enhancements

Proxy (Passthrough) Requests: fixed audio/* content-type passthrough request handling.

1.9.4

2024-12-11

v1.9.4

Enhanced Request/Response Logging

Added comprehensive logging for all request/response phases:
- Original request
- Transformed request
- Original response
- Transformed response

Prometheus Metrics Standardization

Standardized all Prometheus metric labels to use a consistent set:
- method
- route
- code
- custom_labels
- provider
- model
- source

Provider Updates

Ollama and Groq
- Added support for tools.

1.9.3

2024-12-06

v1.9.3

Allow All S3-compatible Log Stores

Added a new LOG_STORE type named S3_CUSTOM which can be used to integrate any S3-compatible storage service for request logging.
The custom host for the storage provider can be set in LOG_STORE_BASEPATH.

New Provider - AWS Sagemaker

AWS Sagemaker models can now be used through Gateway as passthrough requests.
Unified API signature is not yet possible because Sagemaker inherits the request body structure from the underlying model.
Docs Link

1.9.2

2024-11-29

v1.9.2

Proxy (Passthrough) Request Enhancements

Added streamlined support for virtual keys and configs in proxy (passthrough) requests.

Prompt Labels

Added support for labelled prompt cache invalidation whenever an update happens on control plane side.
NOTE: Prompt labels is a control plane change and has no major updates in Gateway apart from cache key invalidation for labelled prompt keys.
Docs Link

S3 Integration Enhancements

Allow sub-paths in bucket name for logs.

Provider Updates

Perplexity: Allow citations in response if strict_open_ai_compliance flag is set to false.
AWS Bedrock
- Stringify the response tool arguments to make it OpenAI compliant.
- Merge successive user messages to avoid Bedrock errors.
Openrouter: Handle cost calculation when input model is openrouter/auto.
Google: Fix the mapping for code in error response.

1.9.1

2024-11-25

v1.9.1

Provider Updates

OpenAI and AzureOpenAI
- For Realtime APIs, the socket close event now retains the original close reason returned by the provider.
- Added support for newly released prediction, store, metadata, audio and modalities parameters.
AWS Bedrock: Fixed an issue where an extra newline character was being returned in the AWS Bedrock response.

1.9.0

2024-11-20

v1.9.0

Dynamic Budgets and Auto Expiry for API Keys and Virtual Keys

Introduced support for setting dynamic budgets and auto-expiry for API keys and virtual keys.

Realtime API Integration

Added Realtime APIs integration for OpenAI and AzureOpenAI.
Docs Link

Provider Updates

VertexAI: Fixed structured outputs integration for VertexAI when using JS SDK. The SDK was adding extra fields in the JSON schema that were incompatible with Vertex’s API requirements.

1.8.4

2024-11-13

v1.8.4

Provider Updates

Azure OpenAI: Added encoding_format and dimensions as supported params.

Fixes & Enhancements

Updated the default behaviour to use IMDS/Service account role for Bedrock and S3.

1.8.3

2024-11-12

v1.8.3

Fixes & Enhancements

Fixed implementation conflicts of existing AWS AssumeRole implementation with the newly released IRSA (IAM Roles for Service Accounts) Assume Role and IMDS (Instance Metadata Service) Assume Role auth approaches.

1.8.2

2024-10-31

v1.8.2

Fixes and Enhancements

Added a new Prometheus metric to track LLM-only latency. Label name: llm_request_duration_seconds

1.8.1

2024-10-30

v1.8.1

Control Plane Log Store

Added a new log and analytics store named control_plane.
Setting LOG_STORE and ANALYTICS_STORE environment variables as control_plane will route all logs and analytics to the control plane and will eliminate the need of having Clickhouse connection on Gateway.

1.8.0

2024-10-25

v1.8.0

Bedrock Converse API integration

Bedrock’s /chat/completions have been updated to use Bedrock converse API.
This enables features like tool calls, vision, etc. for many bedrock models.
This also removes the hassle of maintaining chat templating logic for llama and mistral models.

VertexAI Image Generation

Added support for Vertex Imagen models.

Stable Diffusion v2 Models

StabilityAI introduced v2 models with a new API signature. Gateway now supports both v1 and v2 models, with internal transformations for different API signatures.
Supported for both stability-ai and bedrock providers.
New models: Stable Image Ultra, Core, 3.0 and 3.5.

Pydantic SDK Integration for Structured Outputs

Done for GoogleAI and VertexAI (follows OpenAI)
We previously added support for structured outputs through REST API. However, SDKs using Pydantic were not supported due to extra fields in the JSON schema.
Added a dereferencing function that converts JSON schemas from the library to Google-compatible schemas.

OpenAI and AzureOpenAI Prompt Cache Pricing

Added support for handling prompt caching pricing for required models.

New Providers

Lambda (lambda): Supports chat completions and completions.

Provider Updates

Perplexity: Added the missing [DONE] chunk for stream calls to comply with OpenAI’s spec.
VertexAI: Fixed provider name extraction logic for meta models, so users can send it like other partner models (e.g., meta.<model-name>).
Google: Added structured outputs support (similar to Vertex-ai).

Fixes & Enhancements:

Exclude files, batches, threads, etc. (all passthrough) from llm_cost_sum prometheus metric to avoid unnecessary labels.

Monthly Summary

Enterprise Releases

Product Releases

SDK Releases

Schedule Call

​v1.11.6

​Provider Updates

​Improvements and Fixes

​v1.11.5

​Guardrails

​v1.11.4

​Provider Updates

​Cache Improvements

​Fixes And Improvements

​v1.11.3

​Observability

​Fixes And Improvements

​v1.11.2

​Fixes And Improvements

​v1.11.1

​Provider Updates

​OTEL Traces Collector endpoints

​Log Exports available on Data Plane

​Fixes And Improvements

​v1.11.0

​Provider Updates

​Fixes And Improvements

​v1.10.23

​Provider Updates

​Plugins

​Fixes And Improvements

​v1.10.22

​Enhanced Anthropic PDF Support

​OTEL Configurations

​New Providers

​Provider Updates

​Fixes And Improvements

​v1.10.21

​AWS Bedrock Prompt Caching

​VertexAI Gemini 2.5 Thinking Param Support

​General Purpose File Upload For VertexAI

​Provider Updates

​Fixes And Improvements

​v1.10.20

​OpenAI Embeddings Latency Improvements

​Strict Metadata Enforcement

​Strict Default Config Enforcement

​Provider Updates

​Fixes And Improvements

​v1.10.19

​OTEL Metrics Push

​Milvus Vector Store for Semantic Caching

​Azure Guardrails Support

​Prompts Render Endpoint

​Provider Updates

​Minor Enhancements

​v1.10.18 (Redacted)

​Redaction notice

​Workspace Level Usage and Rate Limits

​OTEL Metrics Push

​Milvus Vector Store for Semantic Caching

​Azure Guardrails Support

​Provider Updates

​v1.10.17

​OpenAI and Azure OpenAI Response API

​Azure AI Foundry Enhancements

​Retry Enhancements

​Configurable Default Cache TTL

​Provider Updates

​Improvements

​v1.10.16

​Improvements

​v1.10.15

​Improvements

​v1.10.14

​Enforce Organisation And Workspace Guardrails

​Unified Finetuning APIs for Fireworks

​Pricing Updates

​Updated Unified API Signature For Anthropic Extended Thinking

​Unified Batches API Improvements

v1.11.6

Provider Updates

Improvements and Fixes

v1.11.5

Guardrails

v1.11.4

Provider Updates

Cache Improvements

Fixes And Improvements

v1.11.3

Observability

Fixes And Improvements

v1.11.2

Fixes And Improvements

v1.11.1

Provider Updates

OTEL Traces Collector endpoints

Log Exports available on Data Plane

Fixes And Improvements

v1.11.0

Provider Updates

Fixes And Improvements

v1.10.23

Provider Updates

Plugins

Fixes And Improvements

v1.10.22

Enhanced Anthropic PDF Support

OTEL Configurations

New Providers

Provider Updates

Fixes And Improvements

v1.10.21

AWS Bedrock Prompt Caching

VertexAI Gemini 2.5 Thinking Param Support

General Purpose File Upload For VertexAI

Provider Updates

Fixes And Improvements

v1.10.20

OpenAI Embeddings Latency Improvements

Strict Metadata Enforcement

Strict Default Config Enforcement

Provider Updates

Fixes And Improvements

v1.10.19

OTEL Metrics Push

Milvus Vector Store for Semantic Caching

Azure Guardrails Support

Prompts Render Endpoint

Provider Updates

Minor Enhancements

v1.10.18 (Redacted)

Redaction notice

Workspace Level Usage and Rate Limits

OTEL Metrics Push

Milvus Vector Store for Semantic Caching

Azure Guardrails Support

Provider Updates

v1.10.17

OpenAI and Azure OpenAI Response API

Azure AI Foundry Enhancements

Retry Enhancements

Configurable Default Cache TTL

Provider Updates

Improvements

v1.10.16

Improvements

v1.10.15

Improvements

v1.10.14

Enforce Organisation And Workspace Guardrails

Unified Finetuning APIs for Fireworks

Pricing Updates

Updated Unified API Signature For Anthropic Extended Thinking

Unified Batches API Improvements

Logging Updates

Provider Updates

Fixes and Improvements

v1.10.13

Unified Batches APIs for VertexAI Embedding