Unpacking Semantic Caching at Walmart

Unpacking Semantic Caching at Walmart

Last month, the LLMs in Prod community had the pleasure of hosting Rohit Chatter, Chief Software Architect at Walmart Tech Global, for a fireside chat on Gen AI and semantic caching in retail. This conversation spanned a wide range of topics, from Rohit's personal journey in the tech industry to the intricate details of Walmart's semantic caching stack!

Dive into the complete conversation

For those who want to revisit the full conversation, here's the link to the full recording. Additionally, we've also opened up a Q&A forum to answer further questions on the topic with Rohit's assistance!

Access the resources here.

And if you want a quick summary of what was discussed and the technical highlights of the chat, we've compiled them below! ↓

Insights from the Fireside Chat

1. Walmart's Shift to Generative AI for Search

Walmart transitioned from traditional NLP methods to Generative AI in its e-commerce search, enabling better handling of complex, contextually relevant product groupings and improving the accuracy of product recommendations.

The company found Generative AI particularly effective in addressing ambiguous, long-tail, natural language queries with limited historical data, enhancing the understanding of customer preferences and product retrieval.

2. Evolution of Search Models

While they were initially using NLP with semantic and lexical (fuzzy) matching, Walmart shifted to Generative AI models, like BERT-based Mini LM V6, to better handle long-tail queries. The selection of models, including Mini LM, E5 small V2, and others, was based on performance in specific Walmart use cases.

3. Customising AI Models

Walmart now uses models like MiniLMv2 and T0, fine-tuned with their customer engagement data, to improve search relevance for ambiguous queries. This fine-tuning is an ongoing process, adapted periodically based on customer behaviour changes.

4. Enhancing Query Processing

The company uses Approximate Nearest Neighbour (ANN) search for relevance matching with slow-changing data in their catalogue, and other techniques for fast-changing data. Their system encodes queries into vectors, using ANN on a catalogue embedding index for improved product matching.

5. Search Quality Metrics

Walmart measures search quality using the metric NDCG@10, focusing on improving the relevance of the top 10 search results. Meeting or exceeding this benchmark is crucial for production readiness.

6. Semantic Caching

To enhance caching efficiency, Walmart implemented semantic caching that clusters queries based on conceptual similarity. This system, using cosine similarity between query embeddings, achieves about a 50% cache hit rate, reducing reliance on expensive LLM search processes.

7. Challenges and Future Plans

A major challenge for Walmart is reducing search latency to under 2-3 seconds at scale. Future developments include investing in generative AI for personalization, voice search, visual search, and ambient discovery experiences over the next 3 years.

8. ROI on Gen AI

Walmart anticipates significant long-term ROI from implementing LLMs and generative AI, especially in generating relevant search results for complex queries and achieving more efficient caching processes.


We thank Rohit for sharing his unabbreviated thoughts with the community and look forward to hosting him again in future! The chat highlights how companies like Walmart are deeply thinking about Gen AI and also underscores the broader shift towards better customer experience, aided by Gen AI.

For those eager to explore the full breadth of questions addressed to Rohit, we invite you to join the LLMs in Prod community on Discord.

To stay updated with next such events hosted by Portkey, subscribe to our events calendar below ↓

Subscribe to Portkey Blog

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
[email protected]