January
Kicking off 2025 with major releases! 🎉
January marks a milestone for Portkey with our first industry report — we analyzed over 2 trillion tokens flowing through Portkey to find out production patterns for LLMs.
We’re also expanding our platform capabilities with advanced PII redaction, JWT authentication, comprehensive audit logs, unified files & batches API, and support for private LLMs. Latest LLMs like Deepseek R1, OpenAI o3, and Gemini thinking model are also integrated with Portkey.
Plus, we are attending the AI Engineer Summit in New York in February, and hosting in-person meetups in Mumbai & NYC.
Let’s dive in!
Summary
Area | Key Updates |
---|---|
Benchmark | • Released LLMs in Prod Report 2025 analyzing 2T+ tokens • Key finding: Multi-LLM deployment is now standard • Average prompt size up 4x, with 40% cost savings from caching |
Security | • Advanced PII redaction with automatic standardized identifiers • JWT authentication support for enterprise deployments • Comprehensive audit logs for all critical actions • Enforced metadata schemas for better governance • Attach default configs & metadata to API keys • Granular workspace management controls |
Platform | • Unified API for files & batches across major providers • Support for private LLM deployments • Enhanced virtual keys with granular controls |
New Models | • Deepseek R1 available across 7+ providers • Added Gemini thinking model • Support for Perplexity Sonar models • o3-mini integration |
Integrations | • AWS Bedrock Guardrails support • Milvus DB & Replicate integrations • Expanded Open WebUI support • Guardrails for embedding requests |
Community | • We did a deep dive into MCP and event-driven architecture for agentic systems |
Our comprehensive analysis of 2T+ tokens processed through Portkey’s Gateway reveals fascinating insights about how teams are deploying LLMs in production. Here are the key findings:
Multi-LLM is the New Normal
Despite OpenAI’s dominance (>50% of prod traffic), teams are actively implementing multi-LLM strategies for reliability and specialized use cases
Prompts are Getting Complex
Average prompt size has increased 4x in the last year, indicating more sophisticated engineering techniques and complex workloads
Caching is Critical
Implementation of proper caching strategies leads to up to 40% cost savings - a must-have for production deployments
Read the full LLMs in Prod 2025 Report →
Platform
Advanced PII Redaction
We’ve significantly enhanced Portkey’s Guardrails with request mutation capabilities.
When any sensitive data (like email, phone number, SSN) is detected in user requests, our PII redaction automatically replaces it with standardized identifiers before it reaches the LLM. This works seamlessly across our entire guardrails ecosystem, including AWS Bedrock Guardrails, Patronus AI, Promptfoo, Pangea, and more.
Unified Files & Batches API
Managing file uploads and batch processing across multiple LLM providers is now dramatically simpler. Instead of building provider-specific integrations, you can:
- Upload once, use everywhere - test your data across different foundation models
- Run A/B tests seamlessly across providers - Choose between native provider batching or Portkey’s custom batch API
Integrate Private LLMs
You can now add your privately hosted LLMs to Portkey’s virtual keys. Simply:
- Configure your model’s base URL
- Set required authentication headers
- Start routing requests through our unified API
This means you can use your private deployments alongside commercial providers, with the same monitoring, reliability, and management features.
API Keys with Default Configs & Metadata
You can now attach default Portkey config & Metadata with any API key you create.
- Automatically monitor how a service/user is consuming Portkey API by enforcing metadata
- Apply Guardrails on requests automatically by adding them to Configs and attaching that to the key
- Set default fallbacks for outgoing request
Enterprise
Running AI at scale requires robust security, visibility, and control. This month, we’ve launched a comprehensive set of enterprise features to enable that:
Authentication & Access Control
- JWT Authentication: Secure API access with JWT tokens, with support for JWKS URL and custom claims validation.
- Workspace Management: Manage workspace access and control who can view logs or create API keys from the Admin dashboard
Governance & Compliance
- Metadata Schemas: Enforce standardized request metadata across teams - crucial for governance and cost allocation
- Audit Logging: Track every critical action across both the Portkey app and Admin API, with detailed user attribution
- Security Settings: Expanded settings for managing logs visibility and API key creation
Customer Love
After evaluating 17 different platforms, this AI team replaced 2+ years of homegrown tooling with Portkey Prompts.
They were able to do this because of three things:
- They could build reusable prompts with our partial templates
- Our versioning let them confidently roll out changes
- And they didn’t have to refactor anything thanks to our OpenAI-compatible APIs
Integrations
Models & Providers
Deepseek R1
Access Deepseek’s latest reasoning model through multiple providers: direct API, Fireworks AI, Together AI, Openrouter, Groq, AWS Bedrock, Azure AI Inference, and more.
Gemini Thinking Model
To keep things OpenAI compatible, you can decide if you’d like Portkey to return the reasoning tokens or not
o3-mini
Available across both OpenAI & Azure OpenAI
Perplexity Sonar
Along with support for their citations and other features
Replicate
Full support for Replicate’s model marketplace
Libraries & Tools
Milvus DB
Direct routing support for Milvus vector database
Open WebUI
Expanded integration capabilities
Langchain
Enhanced documentation and integration guides
Guardrails
Inverse Guardrail
All eligible checks now have an Inverse
option in the UI - which triggers a TRUE
verdict when the Guardrail verdict fails.
AWS Bedrock Guardrails
Native support for AWS Bedrock’s guardrail capabilities
Promptfoo
Enhanced prompt testing capabilities with Promptfoo’s suite of evals
Guardrails on Embedding Requests Portkey Guardrails now work on your embedding input requests!
Community
We are attending the AI Engineer Summit in NYC this February and have some extra event passes to share! Reach out to us on Discord to ask for a pass.
We are also hosting small meetups in NYC and Mumbai this month to meet with local engineering leaders and ML/AI platform leads. Register for them below:
Resources
EDA for Agents
Last month we hosted an inspiring AI practitioners meetup with Ojasvi Yadav and Anudeep Yegireddi to discuss the role of Event-Driven Architecture in building Multi-Agent Systems using and MCP.
Essential reading for your AI infrastructure:
- LLMs in Prod Report 2025: Comprehensive analysis of production LLM usage patterns
- The Real Cost of Building an LLM Gateway: Understanding infrastructure investments
- Critical Role of Audit Logs: Enterprise AI governance
- Error Library: New documentation covering common errors across 30+ providers
- Deepseek on Fireworks: How to use Portkey with Fireworks to call Deepseek’s R1 model for reasoning tasks
Improvements
- Token counting is now more accurate for Anthropic streams
- Added logprobs for Vertex AI
- Improved usage object mapping for Perplexity
- Fixed some tricky cache behaviors
- Error handling is more robust across all SDKs
Support
Was this page helpful?