Open Source
AI Engineering Hours
Discussion notes from the weekly AI engineering meetup
29 Nov
Summary
Teams from Springworks and Haptik shared hard-won insights from running LLMs in production: Gemini outperforms gpt-4o for Hinglish translation, and shifting to managed Gateways cuts latency in half. Plus practical tips on caching and RAG optimization at scale.
Attendees
Karan Trehan
SDE-2, Springworks
Komal Singh
DevOps Engineer, Jio Haptik
Pratham Naveen
Gen AI, NetApp
Vinodraj V K
Gen AI, NetApp
Notes
On Production Patterns
- Haptik & Springworks map Portkey virtual keys to their model deployments, making it simple for engineers to prototype & build AI features
- Monitor Portkey analytics to understand deployment behavior and pre-scale resources to avoid rate limits
- For secure testing, use short-lived virtual keys instead of sharing long-term access
Some Learnings
- Infrastructure insight: Each additional middleware layer (auth, rate limiting) compounds latency at scale - consider using Gateway features directly instead of custom layers
- Plan for caching early: Auxiliary services inevitably add latency at scale - implement caching in your initial development cycle
- In RAG pipelines, Vector DB operations become bottlenecks before LLM calls - optimize these first
- For Hinglish audio translations, especially with noise, Gemini proves more reliable than gpt-4o
Was this page helpful?