About
A leading delivery platform that connects users with restaurants, grocery stores, and retailers, providing fast and efficient services.
Industry
Online food delivery
Company Size
10,000+ employees
Headquarters
North America
Founded
Early 2010s
Why Portkey:
Unified files and batch API, fine-tuning API
LLM Providers
Engineers
AI Requests
The growing complexity of AI at scale
By early 2024, this company had expanded its AI initiatives across multiple teams, supporting use cases in fraud detection, customer service automation, and internal tooling. What started as a series of isolated experiments had grown into a vast AI ecosystem, with over 400 engineers building AI-powered features on top of 150+ models spread across multiple cloud providers.
Managing this scale came with serious infrastructure challenges. The machine learning platform team was under pressure to maintain performance, reliability, and cost efficiency while enabling fast-paced AI development.
The challenges of multi-provider AI infrastructure
As AI became more deeply integrated across teams, the delivery platform started experiencing mounting operational headaches.
Their engineers had to connect to dozens of different LLM providers - different APIs to integrate, various rate limits to monitor, and performance that varied widely between services. When calls to these services failed (which happened regularly at their scale), they needed backup systems that could quickly step in and rescue those requests.
High-volume AI workloads can get expensive fast, especially without proper optimization. To reduce unnecessary costs, they wanted to set up request deduplication and caching for common queries.
All this while keeping customer data secure and meeting compliance requirements.
The challenge wasn’t just about technology - it was about creating an infrastructure that balanced flexibility for developers with the control required at an enterprise level.different APIs, rate limits, and performance profiles
We needed a way to give teams the freedom to innovate while maintaining control over costs and security. The challenge wasn't just technical—it was about building a platform that could scale with our growth.
~ Engineering Lead, AI Platform Team
Finding the right solution
The team explored several options before discovering Portkey’s open-source AI Gateway. They were looking for a solution that could handle large-scale production workloads without sacrificing enterprise security. Portkey stood out for its:
Battle-tested gateway with sub-5ms latency
Support for all major LLM providers, enabling seamless provider switching
Intelligent request deduplication and caching to reduce redundant calls
Comprehensive cost monitoring and optimization tools
Enterprise-grade security architecture for compliance and data protection
Implementing an AI infrastructure built for scale
To maintain control over sensitive data, the company deployed Portkey in a hybrid model—running the data plane within their VPC while using Portkey’s cloud-based control and optimization features. This approach ensured that:
Sensitive data remained within their infrastructure
Observability across all AI requests was maintained
Cost tracking and usage monitoring were streamlined
The platform now processes tens of millions of AI requests per quarter. It automatically retries failed requests, dynamically reroutes traffic across LLM providers, and optimizes API calls through caching.
The question wasn’t whether we could build this infrastructure ourselves—we absolutely could. The question was whether we should dedicate our best engineers to infrastructure instead of AI products that drive business value.
~ Platform Director, AI Division
The impact: More efficiency, lower costs, and faster iteration
After implementing Portkey, the delivery platform saw tangible benefits that went beyond solving technical hurdles.
The system saw a 3100X increase in traffic without breaking a sweat. Even during peak load, with traffic spiking to 1,800 RPS, the platform remained stable.
Reliability is built in. Smart fallback logic has already rescued nearly half a million failed requests, ensuring a seamless user experience. Caching prevents redundant API calls, and optimized routing directs traffic to the most cost-effective providers. Together, these systems have reduced overall LLM spends, saving over $500,000 to date.
Adding new LLM providers, once a week-long integration, now takes hours. Teams can test and deploy new models quickly, accelerating time-to-value across use cases.
The system is now used by over 1000+ engineers across 350+ workspaces. It routes traffic across Anthropic, OpenAI, Vertex, and others, automatically retrying failed requests, optimizing calls through caching, and maintaining 99.99% uptime across billions of requests.