About
A leading delivery platform that connects users with restaurants, grocery stores, and retailers, providing fast and efficient services.
Industry
Online food delivery
Company Size
10,000+ employees
Headquarters
North America
Founded
Early 2010s
Favorite Feature:
Unified files and batch API, fine-tuning API
150+
LLM Providers
1000+
Engineers
1BN+
AI Requests
The growing complexity of AI at scale
By early 2024, this company had expanded its AI initiatives across multiple teams, supporting use cases in fraud detection, customer service automation, and internal tooling. What started as a series of isolated experiments had grown into a vast AI ecosystem, with over 400 engineers building AI-powered features on top of 150+ models spread across multiple cloud providers.
Managing this scale came with serious infrastructure challenges. The machine learning platform team was under pressure to maintain performance, reliability, and cost efficiency while enabling fast-paced AI development.
The challenges of multi-provider AI infrastructure
As AI became more deeply integrated across teams, the delivery platform started experiencing mounting operational headaches.
Their engineers had to connect to dozens of different LLM providers - different APIs to integrate, various rate limits to monitor, and performance that varied widely between services. When calls to these services failed (which happened regularly at their scale), they needed backup systems that could quickly step in and rescue those requests.
High-volume AI workloads can get expensive fast, especially without proper optimization. To reduce unnecessary costs, they wanted to set up request deduplication and caching for common queries.
All this while keeping customer data secure and meeting compliance requirements.
The challenge wasn’t just about technology - it was about creating an infrastructure that balanced flexibility for developers with the control required at an enterprise level.different APIs, rate limits, and performance profiles
We needed a way to give teams the freedom to innovate while maintaining control over costs and security. The challenge wasn't just technical—it was about building a platform that could scale with our growth.
~ Engineering Lead, AI Platform Team
Finding the right solution
The team explored several options before discovering Portkey’s open-source AI Gateway. They were looking for a solution that could handle large-scale production workloads without sacrificing enterprise security. Portkey stood out for its:
Battle-tested gateway with sub-5ms latency
Support for all major LLM providers, enabling seamless provider switching
Intelligent request deduplication and caching to reduce redundant calls
Comprehensive cost monitoring and optimization tools
Enterprise-grade security architecture for compliance and data protection
Implementing an AI infrastructure built for scale
To maintain control over sensitive data, the company deployed Portkey in a hybrid model—running the data plane within their VPC while using Portkey’s cloud-based control and optimization features. This approach ensured that:
Sensitive data remained within their infrastructure
Observability across all AI requests was maintained
Cost tracking and usage monitoring were streamlined
The platform now processes tens of millions of AI requests per quarter. It automatically retries failed requests, dynamically reroutes traffic across LLM providers, and optimizes API calls through caching.
The question wasn’t whether we could build this infrastructure ourselves—we absolutely could. The question was whether we should dedicate our best engineers to infrastructure instead of AI products that drive business value.
~ Platform Director, AI Division
The impact: More efficiency, lower costs, and faster iteration
After implementing Portkey, the delivery platform saw tangible benefits that went beyond just solving their technical challenges.
Their system now handles tens of millions of AI requests without breaking a sweat. The platform runs smoothly even during peak traffic periods. The system also rescues hundreds of thousands of failed requests through smart retry mechanisms, maintaining a seamless experience for users. Additionally with caching, the system prevents redundant API calls, while optimized routing directs traffic to the most cost-effective providers for each type of request. This led to a remarkable 40%reduction in their overall LLM costs.
Security remains rock-solid with zero incidents since deployment. The hybrid architecture keeps sensitive data protected while still leveraging cloud-based management features.
The technical team gained agility too. Previously, adding a new LLM provider took weeks of integration work. Now they can bring new models online in hours, creating 10x faster deployment cycles for AI features across the organization.
Perhaps most impressive is the reliability—the platform maintains 99.99% uptime, ensuring AI capabilities are available whenever and wherever they're needed throughout the business.
The platform now processes billions of AI requests. It automatically retries failed requests, dynamically reroutes traffic across LLM providers, and optimizes API calls through caching.