April in review ◀️✨

We kicked off April with an announcement— we make 95% of LLM costs vanish overnight. Just a bait, some of you bit (≧ᗜ≦)

While we can’t make bills disappear with a snap, we’ve delivered some powerful upgrades this month that will help you build and ship robust, reliable GenAI apps, faster!

This month, we introduced updates to the platform and gateway around governance, security & guardrails, new integrations, and all the latest models! Along with this, we’re working on something bigger: the missing piece in the AI agents stack!

Here’s what we shipped last month:

Summary

AreaKey Updates
Platform• Prompt CRUD APIs
• Export logs to your internal stack
• Budget limits and rate limits on workspace
• n8n integration
• OpenAI Codex CLI integration
• New retry setting to determine wait times
• Milvus for Semantic Cache
• Plugins moved to org-level Settings
• Virtual Key exhaustion alert includes workspace
• Workspace control setup option
Gateway & Providers• OpenAI embeddings latency improvement (200ms)
• Responses API for OpenAI & Azure OpenAI
• Bedrock prompt caching via unified API
• Virtual keys for self-hosted models
• Tool calling support for Groq, OpenRouter, and Ollama
• New providers: Dashscope, Recraft AI, Replicate, Azure AI Foundry
• Enhanced parameter support: Openrouter, Vertex AI, Perplexity, Bedrock
• Claude’s anthropic_beta parameter for Computer use beta
Technical Improvements• Unified caching/logging of thinking responses
• Strict metadata logging: Workspace > API Key > Request
• Prompt render endpoint available on Gateway URL
• API key default config now locked from overrides
New Models & Integrations• GPT-4.1
• Gemini 2.5 Pro and Flash
• LLaMA 4 via Fireworks, Together, Groq
• o1-pro
• gpt-image-1
• Qwen 3
• Audio models via Groq
Guardrails• Azure AI Content Safety integration
• Exa Online Search as a Guardrail

Platform

Prompt CRUD APIs

Prompt CRUD APIs give you the control to scale by enabling you to:

  • Programmatically create, update, and delete prompts
  • Manage prompts in bulk or version-control them
  • Integrate prompt updates into your own tools and workflows
  • Automate updates for A/B testing and rapid experimentation

Read more about this here.

Export logs to your internal stack

Enterprises can now push analytics logs to any OTEL-compliant store through Portkey to centralize monitoring, maintain compliance, and ensure efficient operations. See how it’s done here

Budget limits and rate limts on workspace

Configure budget and rate limits at the workspace level to:

  • Allocate specific budgets to different departments, teams, or projects
  • Prevent individual workspaces from consuming disproportionate resources
  • Ensure equitable API access and complete visibility

n8n integration

Add enterprise-grade controls to your n8n workflows with:

  • Unified AI Gateway: Connect to 1600+ models with full API key management—not just OpenAI or Anthropic.
  • Centralized observability: Track 40+ metrics and request logs in real time.
  • Governance: Monitor spend, set budgets, and apply RBAC across workflows.
  • Security guardrails: Enable PII detection, content filtering, and compliance controls.

Read more about the integrationhere

OpenAI Codex CLI integration

OpenAI Codex CLI gives developers a streamlined way to analyze, modify, and execute code directly from their terminal. Portkey’s integration enhances this experience with:

  • Access to 250+ additional models beyond OpenAI Codex CLI’s standard offerings
  • Content filtering and PII detection with guardrails
  • Real-time analytics and logging
  • Cost attribution, budget controls, RBAC, and more!

Read more about the integrationhere

Other updates

  • Introduced a new retry setting use_retry_after_header. When set to true, if the provider returns the retry-after or retry-after-ms headers, the Gateway will use these headers to determine retry wait times, rather than applying the default exponential backoff for 429 responses.
  • You can now store and retrieve vector embeddings for semantic cache using Milvus in Portkey. Read more about semantic cache store here
  • Plugins have now been moved under Settings (org-level) in the Portkey app.
  • Virtual Key exhaustion alert emails now include which workspace the exhausted key belonged to.
  • Set up your workspace with Workspace control on the Portkey app.

Gateway & Providers

OpenAI embeddings response We’ve optimized the Gateway’s handling of OpenAI embeddings requests, leading to around 200ms improvement in response latency.

Responses API

You can now use the Responses API to access OpenAI and Azure OpenAI models on Portkey, enabling a flexible and easier way to create agentic experiences.

  • Complete observability and usage tracking
  • Caching support for streaming requests
  • Access to advanced tools — web search, file search, and code execution, with per-tool cost tracking

Bedrock prompt caching

You can now implement Amazon Bedrock’s prompt caching through our OpenAI-compliant unified API and prompt templates.

  • Cache specific portions of your requests for repeated use
  • Reduce inference response latency and input token costs

Read more about the implementation here

Virtual keys for self-hosted models

You can now create a virtual key for any self-hosted model - whether you’re running Ollama, vLLM, or any custom/private model.

  • No extra setup required
  • Stay in control with logs, traces, and key metrics
  • Manage all your LLM interactions through one interface

Advanced capabilities

  • Openrouter: Added mapping for new parameters - modalities, reasoning, transforms, provider, models, response_format.
  • Vertex AI: Added support for explicitly mentioning mime_type for urls sent in the request. Gemini 2.5 thinking parameters are now available.
  • Perplexity: Added support for response_format and search_recency_filter request parameters.
  • Bedrock: You can now pass the anthropic_beta parameter in Bedrock’s Anthropic API via Portkey to enable Claude’s Computer use beta feature.

Tool calling

Portkey now supports tool calling for Groq, OpenRouter, and Ollama

New Providers

Dashscope

Integrate with Dashscope

Recraft AI

Generate production-ready visuals with Recraft

Replicate

Run open-source models via simple APIs with Replicate

Azure AI Foundry

Access over 1,800 models with Azure AI Foundry

Technical Improvements

  • Caching and Logging Unified Thinking Responses: Unified thinking response (content_blocks) now logged and cached for stream responses.
  • Strict Metadata Enforcement: The metalogging preference order now is Workspace Default > API Key Default > Incoming Request. This is provide better control to org admins and ensure values set by them are not overridden.
  • Prompt render endpoint: Previously only available via the control plane, the prompt render endpoint is now supported directly on the Gateway URL.
  • Default config in an API key can no longer be overridden.

New Models & Integrations

GPT-4.1

OpenAI’s new model for faster and improved responses

Gemini 2.5 Pro

Google’s most advanced model

Gemini 2.5 Flash

Google’s fast, coest-efficient thinking model

Llama 4

Meta’s latest model via Fireworks, Together, and Groq

o1-pro

OpenAI’s model for better reasoning and consistent answers

gpt-image-1

OpenAI’s latest image generation capabilities

Qwen 3

Alibaba’s latest model with hybrid reasoning

Audio models

Access audio models via Groq

Guardrails

  • Azure AI content safety: Use Microsoft’s content filtering solution to moderate inputs and outputs across supported models.

  • Exa Online Search: You can now configure Exa Online Search as a Guardrail in Portkey to enable real-time, grounded search across the web before answering. This makes any LLM capable of handling current events or live queries without needing model retraining.

Documentation

Administration Docs

We’ve made significant improvements to our documentation:

  • Virtual keys access: Defining who can view and manage virtual keys within workspaces. Learn more
  • API keys access: Control how workspace managers and members interact with API keys within their workspaces. Learn more

Community

Here’s a tutorial on how to build a customer supporr agent using Langraph and Portkey. Shoutout to Nerding I/O!!

Customer love!

Partner blog

See how Portkey and Pillar together can help you build secure GenAI apps for production.

Community Contributors

A special thanks to our community contributors this month:

Coming this month!

We’re changing how agents go to production, from first principles. Watch out for this 👀

Support