Schedule Call

Discuss how Portkey’s AI Gateway can enhance your organization’s AI infrastructure

1.10.15
2025-04-02

v1.10.15


Improvements

  • File Upload:
    • Support for uploading large files to Providers and Data Service
  • Allow users to pass custom mime-types in the request body. For example:
    {
    "model": "gemini-1.5-pro",
    "messages": [
        {
            "role": "system",
            "content": "You are a helpful assistant!"
        },
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "What's in this image?"
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "<image-url>",
                        "mime_type": "image/jpeg"
                    }
                }
            ]
        }
    ]
    }
1.10.14
2025-03-28

v1.10.14


Enforce Organisation And Workspace Guardrails

Unified Finetuning APIs for Fireworks

  • Extended the existing unified finetuning APIs to support Fireworks provider.

Pricing Updates

  • Added support for calculating Perplexity search cost and Gemini grounding cost.

Updated Unified API Signature For Anthropic Extended Thinking

  • Updated the unified API signature for Extended thinking which was introduced in v1.10.12 to ensure that OpenAI compliant field of the response remain untouched regardless of strict_open_ai_compliance flag.
  • More Details:

Unified Batches API Improvements

  • custom_id will be preserved in the VertexAI batch output.
  • Fixed some issues with batches cost calculation.

Logging Updates

  • Non-OpenAI compliant fields like groundingMetadata (Gemini Grounding), citations (Perplexity Search) and extended thinking response will now be logged for stream responses. Previously, these fields were not logged specifically for streaming response.

Provider Updates

  • Fireworks: Added support for logprobs and top_logprobs parameters.

Fixes and Improvements

  • Added new environment variable (AWS_ENDPOINT_DOMAIN) which can be used to override the default value (amazonaws.com)
  • Fixed an edge case where before_request_hook failures were not getting flagged with 246 response status code for cached and non-cached stream responses.
1.10.13
2025-03-20

v1.10.13


Unified Batches APIs for VertexAI Embedding

  • Added support for batch processing of embeddings with Vertex AI.

Provider Updates

  • AWS Bedrock
    • Multi-Turn Conversation With Tools:
      • Handled assistant messages where content is set as null and tool_calls are passed.
  • OpenAI
    • Fixed an edge case (introduced in the previous version) which was causing issues in cost calculation of fine-tuned models.

Fixes and Improvements

  • Fixed batch pricing calculation issue for VertexAI and Anthropic Bedrock models.
  • Fixed an edge case where the x-portkey-retry-attempt-count response header was set to -1 even when no retries were configured.
  • Improved handling to skip stream mode detection for irrelevant request types. For example: stream mode detection should not happen for any GET requests as it is not supported.
  • Removed redundant AWS credential fetch failures at boot time.
1.10.12
2025-03-18

v1.10.12


Real-Time Model Pricing Sync

  • Model pricing configs are no longer coupled with gateway builds.
  • For hybrid deployments, model pricing configs will be fetched from the control plane.

Unified API Signature For Anthropic Thinking

  • Introduced a unified API signature to support single-turn and multi-turn conversations with Anthropic Extended Reasoning across Anthropic, AWS Bedrock and VertexAI.
  • More Details:

Prometheus Metric Updates

  • Added a new metric (llm_last_byte_diff_duration_milliseconds) to track LLM last byte latency for chunked JSON responses.
  • Added a new label (stream) for all metrics. Possible values: 0/1

Guardrails Updates

  • AWS Bedrock: Added handling to flag regex patterns returned by the guardrail.

Provider Updates

  • Azure OpenAI: Mapped the correct model name from multi-deployment virtual keys.

Fixes and Improvements

  • Portkey 500s are now logged in the console for debugging.

Internal POD to POD HTTPS Support

  • Added support for internal POD to POD HTTPS communication.
  • This can be enabled by mounting a volume with certificate and key.
  • TLS_KEY_PATH and TLS_CERT_PATH environment variables will be used to fetch the certificate and key from the volume.
1.10.11
2025-03-06

v1.10.11


Provider Updates

  • AWS Bedrock
    • Added support for encryption key usage when uploading files to S3.
  • VertexAI:
    • Minor updates to streamline the unified spec for batches and fine-tune APIs.
    • Updated pricing for gemini-2.0-flash-lite models.
    • Added support for webm mimeType.
  • Openrouter
    • Mapped the usage object for streaming responses.
  • Azure Inference
    • Replaced extra-parameters: ignore with extra-parameters: drop due to deprecation by Azure.
  • OpenAI and Azure OpenAI
    • Update pricing for GPT 4.5 models
1.10.10
2025-02-27

v1.10.10


Unified Finetuning APIs for VertexAI

  • Extended the existing unified finetuning APIs to support VertexAI.
  • The File-upload and transformations will be done according to the provider requirements.

Body Params Support in Conditional Router

  • Added support for using params to specify body fields in conditional router queries. Previously, only metadata-based routing was supported.

Streaming Cache Responses Optimization

  • Increased stream chunk content size from 1 token to 125 tokens for cached responses. This reduces the number of chunks significantly (e.g., 2000 tokens now stream in ~16 chunks instead of 2000 chunks).
  • Improved last chunk delivery time.
  • In addition to latency improvements, this update reduces unnecessary network overhead caused by the large number of chunks.

AWS IRSA-based Authentication Updates

  • Switched from the default global STS endpoint to regional STS endpoints (for Bedrock and S3 requests) to ensure proper token generation when the global STS is unavailable from the instance.

Provider Updates

  • Anthropic:
    • Better error handling for error type stream chunks returned by the provider.
    • Pricing updates for Claude 3.7 models across Anthropic, Bedrock and VertexAI.
1.10.9
2025-02-20

v1.10.9


Redis Cache Optimization

  • Updated cache implementation to avoid redundant Redis calls to improve overall performance.

VertexAI Service Account Token Caching

  • Implemented caching for Vertex service account token. Previously, tokens were being regenerated on every request despite having 1-hour validity.
  • This will reduce VertexAI request latency by 50-100ms per request.

Provider Updates

  • Google and VertexAI
    • Handled tool call response parsing when there is one part tool call and one part text.
    • Made the default/empty usage object compliant with OpenAI for streaming response.
1.10.8
2025-02-13

v1.10.8


Mutator Webhooks

  • The existing webhook plugin now has mutation capability.
  • This can be used for use-cases like BYO-PII redaction guardrail.

Configurable Timeouts for Guardrails

  • It is now possible to set timeout values for Guardrail execution. The current default value is 5 seconds.
  • timeout parameter can be used for all the guardrails that make a fetch call internally.
  • It is also possible to store this timeout value in control plane while creating/updating a Guardrail on UI.

Provider Updates

  • AzureOpenAI: Added support for stream_options parameter.
1.10.7
2025-02-10

v1.10.7


Fixes and Enhancements

  • Fix: Allow empty body in POST and PUT requests. Gateway was adding empty object as a default body for POST and PUT requests. This caused issues for APIs like POST assistants cancel or POST batches cancel where the upstream provider does not accept body at all.
1.10.6
2025-02-07

v1.10.6


Unified Batches APIs for AzureOpenAI

  • Extended the unified batches APIs to support AzureOpenAI batching.

Provider Updates

  • Deepseek Models: Added support for Deepseek models across multiple inference providers like Fireworks, Groq and Together.

Fixes and Enhancements

  • Chore: Allow budget exhausted user API keys to view logs. Control plane uses user API keys to fetch UI logs from the Gateway. Budget exhaustion of these keys should not have blocked logs view.
1.10.5
2025-02-06

v1.10.5


JWT Auth

  • Added support for JWT based authentication and authorization.
  • Customers can configure their JWKS endpoint or the JWKS JSON.

Unified Batches APIs for VertexAI

  • Extended the unified batches APIs to support VertexAI batching.

Provider Updates

  • Google and VertexAI: Updated the Grounding implementation to support their new API signatures. Docs Link
  • AWS Bedrock: Handle edge cases for AWS Bedrock file uploads.

Fixes and Enhancements

  • Logging: Added exception details like cause and name in logs for provider level fetch failures.
  • Caching: Enabled caching even when the debug flag is set to false.
1.10.4
2025-01-29

v1.10.4


PII Redaction Guardrails

  • Added PII Redaction Guardrails through multiple guardrail providers:
    • Portkey Managed
    • AWS Bedrock
    • Pangea
    • Patronus
    • Promptfoo
  • If any entities were redacted from request/response, the guardrail result object in the final response will contain a flag named transformed set to true.

Request Metadata Logging Updates

  • Workspace metadata will now logged on individual request level.

New Providers

  • Replicate: Now supported for proxy (passthrough) requests.

Fixes and Enhancements

  • Guardrails: Added ability to override default guardrail credentials (stored in control plane) with custom credentials at runtime.
1.10.3
2025-01-23

v1.10.3


AzureOpenAI Unified Finetuning Support

  • Extended the unified finetuning APIs to support AzureOpenAI provider.

AWS Bedrock Guardrails

  • AWS Bedrock Guardrails are now supported for request/response checks.
  • Here is a short document which can be used to setup this with Portkey.

Virtual Keys for Custom Models/Providers

  • It is now possible to configure custom host and custom headers directly in the virtual keys.
  • If your custom model’s API signature matches any of our existing providers, you can create a virtual key with your custom settings.
  • While this functionality was already available, it has now been integrated directly into virtual keys for more streamlined configuration.

Prometheus Metric Updates

  • Updated the units for LLM request duration histogram metrics to milliseconds. The label has been renamed from llm_request_duration_seconds to llm_request_duration_milliseconds
  • Added a new metric named portkey_request_duration_milliseconds to track Portkey’s processing latency.

New Providers

  • Milvus DB: Supported as a passthrough provider.

Provider Updates

  • VertexAI and Google Improvements
    • Added logprobs support compatible with OpenAI format via logprobs and top_logprobs parameters
    • Added support for experimental Gemini Thinking Models.
    • Added tool parameters JSON schema handling to ignore/skip fields which are not compatible with these 2 providers.
  • Anthropic: Added total_tokens in stream response to make it compliant with OpenAI spec.
1.10.2
2025-01-14

v1.10.2


Provider Updates

  • VertexAI: VertexAI requests that sent the virtual key and config headers separately were failing with a provider 401 error. This was happening specifically for VertexAI requests where the virtual key was sent as a separate header along with a config header.
1.10.1
2025-01-13

v1.10.1


Unified Finetune APIs

  • Added unified finetune APIs for OpenAI, AzureOpenAI, Bedrock and Fireworks.

Fixes and Enhancements

  • Code Detection Guardrail Updates: Added checks for verbose identifiers to detect python and js markdown code blocks. Example: check for python and javascript along with py and js identifiers.
1.10.0
2025-01-03

v1.10.0


Unified Batches and Files API

  • Added unified batching APIs for OpenAI, AWS Bedrock and Cohere
  • Docs Link

Improved Batch Management for Analytics Data Inserts

  • Improved Clickhouse batch management to prevent log drops.
  • Notable reduction in memory usage growth and spikes compared to previous builds.
  • We also recommend changing ANALYTICS_STORE env to control_plane (for hybrid deployments) so that batching/retries can managed by Portkey.

Gateway Docker Image Size Reduction:

  • Made some updates to the image build process, reducing the size (compressed) from ~275MB to ~75MB.

VertexAI Self-Deployed Models (a.k.a Endpoints in Vertex):

  • You can now use self-deployed models from VertexAI. This update also supports Vertex-Huggingface models.

Shorthand Format For Guardrails In Config:

  • Added input_guardrails and output_guardrails fields in config which accept array of guardrail slugs.

Guardrail Output Explanation

  • Guardrails responses now include an explanation property to clarify why checks passed or failed.
  • This property is currently only available for default checks.

OpenAI developer Role Support Across All Providers:

  • For OpenAI and AzureOpenAI, the role will be mapped as expected.
  • For other providers, the developer role is mapped to the system role (or its equivalent).

New Partner Guardrails

  • Mistral (mistral.moderateContent): Guard against different type of contents like hate_and_discrimination, violence_and_threats, etc.
  • Pangea (pange.textGuard): Guard against malicious content and other undesirable data.

Provider Updates

  • Cohere: Removed unsupported stream parameter from the Bedrock-Cohere integration.

Fixes and Enhancements

  • Image Cost Calculation: Updated the image calculation logic to handle different quality, size, etc. combinations.
  • ValidURL Guardrail: Updated the URL extraction logic to handle more edge cases.
  • Prompt Render Error Message: Prompt render API (/render) is a control plane API. Added detailed message to highlight this in case a user tries to use this API on their deployed Gateway.
1.9.5
2024-12-17

v1.9.5


Gemini Grounding Mode Support

  • Added Gemini grounding mode support in OpenAI compatible tools format.
  • Docs Link

Provider Updates

  • Groq: Fixed finish_reason mapping for streaming response.
  • AWS Bedrock: fixed the index mapping for tool call streaming response.
  • VertexAI: fixed final model param mapping for VertexAI Meta partner models.

Fixes and Enhancements

  • Proxy (Passthrough) Requests: fixed audio/* content-type passthrough request handling.
1.9.4
2024-12-11

v1.9.4


Enhanced Request/Response Logging

  • Added comprehensive logging for all request/response phases:
    • Original request
    • Transformed request
    • Original response
    • Transformed response

Prometheus Metrics Standardization

  • Standardized all Prometheus metric labels to use a consistent set:
    • method
    • route
    • code
    • custom_labels
    • provider
    • model
    • source

Provider Updates

  • Ollama and Groq
    • Added support for tools.
1.9.3
2024-12-06

v1.9.3


Allow All S3-compatible Log Stores

  • Added a new LOG_STORE type named S3_CUSTOM which can be used to integrate any S3-compatible storage service for request logging.
  • The custom host for the storage provider can be set in LOG_STORE_BASEPATH.

New Provider - AWS Sagemaker

  • AWS Sagemaker models can now be used through Gateway as passthrough requests.
  • Unified API signature is not yet possible because Sagemaker inherits the request body structure from the underlying model.
  • Docs Link
1.9.2
2024-11-29

v1.9.2


Proxy (Passthrough) Request Enhancements

  • Added streamlined support for virtual keys and configs in proxy (passthrough) requests.

Prompt Labels

  • Added support for labelled prompt cache invalidation whenever an update happens on control plane side.
  • NOTE: Prompt labels is a control plane change and has no major updates in Gateway apart from cache key invalidation for labelled prompt keys.
  • Docs Link

S3 Integration Enhancements

  • Allow sub-paths in bucket name for logs.

Provider Updates

  • Perplexity: Allow citations in response if strict_open_ai_compliance flag is set to false.
  • AWS Bedrock
    • Stringify the response tool arguments to make it OpenAI compliant.
    • Merge successive user messages to avoid Bedrock errors.
  • Openrouter: Handle cost calculation when input model is openrouter/auto.
  • Google: Fix the mapping for code in error response.
1.9.1
2024-11-25

v1.9.1


Provider Updates

  • OpenAI and AzureOpenAI
    • For Realtime APIs, the socket close event now retains the original close reason returned by the provider.
    • Added support for newly released prediction, store, metadata, audio and modalities parameters.
  • AWS Bedrock: Fixed an issue where an extra newline character was being returned in the AWS Bedrock response.
1.9.0
2024-11-20

v1.9.0


Dynamic Budgets and Auto Expiry for API Keys and Virtual Keys

  • Introduced support for setting dynamic budgets and auto-expiry for API keys and virtual keys.

Realtime API Integration

  • Added Realtime APIs integration for OpenAI and AzureOpenAI.
  • Docs Link

Provider Updates

  • VertexAI: Fixed structured outputs integration for VertexAI when using JS SDK. The SDK was adding extra fields in the JSON schema that were incompatible with Vertex’s API requirements.
1.8.4
2024-11-13

v1.8.4


Provider Updates

  • Azure OpenAI: Added encoding_format and dimensions as supported params.

Fixes & Enhancements

  • Updated the default behaviour to use IMDS/Service account role for Bedrock and S3.
1.8.3
2024-11-12

v1.8.3


Fixes & Enhancements

  • Fixed implementation conflicts of existing AWS AssumeRole implementation with the newly released IRSA (IAM Roles for Service Accounts) Assume Role and IMDS (Instance Metadata Service) Assume Role auth approaches.
1.8.2
2024-10-31

v1.8.2


Fixes and Enhancements

  • Added a new Prometheus metric to track LLM-only latency. Label name: llm_request_duration_seconds
1.8.1
2024-10-30

v1.8.1


Control Plane Log Store

  • Added a new log and analytics store named control_plane.
  • Setting LOG_STORE and ANALYTICS_STORE environment variables as control_plane will route all logs and analytics to the control plane and will eliminate the need of having Clickhouse connection on Gateway.

1.8.0
2024-10-25

v1.8.0


Bedrock Converse API integration

  • Bedrock’s /chat/completions have been updated to use Bedrock converse API.
  • This enables features like tool calls, vision, etc. for many bedrock models.
  • This also removes the hassle of maintaining chat templating logic for llama and mistral models.

VertexAI Image Generation

  • Added support for Vertex Imagen models.

Stable Diffusion v2 Models

  • StabilityAI introduced v2 models with a new API signature. Gateway now supports both v1 and v2 models, with internal transformations for different API signatures.
  • Supported for both stability-ai and bedrock providers.
  • New models: Stable Image Ultra, Core, 3.0 and 3.5.

Pydantic SDK Integration for Structured Outputs

  • Done for GoogleAI and VertexAI (follows OpenAI)
  • We previously added support for structured outputs through REST API. However, SDKs using Pydantic were not supported due to extra fields in the JSON schema.
  • Added a dereferencing function that converts JSON schemas from the library to Google-compatible schemas.

OpenAI and AzureOpenAI Prompt Cache Pricing

  • Added support for handling prompt caching pricing for required models.

New Providers

  • Lambda (lambda): Supports chat completions and completions.

Provider Updates

  • Perplexity: Added the missing [DONE] chunk for stream calls to comply with OpenAI’s spec.
  • VertexAI: Fixed provider name extraction logic for meta models, so users can send it like other partner models (e.g., meta.<model-name>).
  • Google: Added structured outputs support (similar to Vertex-ai).

Fixes & Enhancements:

  • Exclude files, batches, threads, etc. (all passthrough) from llm_cost_sum prometheus metric to avoid unnecessary labels.