Over the past few months, we've been keenly observing latencies for both GPT 3.5 & 4. The emerging patterns have been intriguing. The standout observation? GPT-4 is catching up in speed, closing the latency gap with GPT 3.5. Our findings reveal a consistent decline in GPT-4 latency. While your
It's been some time since Llama 2's celebrated launch and we've seen the dust settle a bit and real use cases come to life. In this blog post, we answer frequently asked questions on Llama 2's capabilities and when should you be using it. Let's dive in! What is Llama
In this blog post, we explore a roadmap for building reliable large language model applications. Let’s get started!
Implementing semantic cache from scratch for production use cases.
The paper discusses the cost associated with querying large language models (LLMs) and proposes FrugalGPT, a framework that uses LLM APIs to process natural language queries within a budget constraint. The framework uses prompt adaptation, LLM approximation, and LLM cascade to reduce the inference
Learn how to use the eval framework to evaluate models & prompts to optimise LLM systems for the best outputs.
The paper discusses the importance of managing ambiguity in natural language understanding and evaluates the ability of language models (LMs) to recognize and disentangle possible meanings. The authors present AMBIENT, a linguist-annotated benchmark of 1,645 examples with diverse kinds of ambiguity
The paper reports on the investigation of an early version of GPT-4, which is part of a new cohort of LLMs that exhibit more general intelligence than previous AI models. The paper demonstrates that GPT-4 can solve novel and difficult tasks that span mathematics, coding, vision, medicine, law, psyc
The paper discusses eight potentially surprising claims about large language models (LLMs), including their predictable increase in capability with increasing investment, the unpredictability of specific behaviors, and the lack of reliable techniques for steering their behavior.
The paper presents the first attempt to use GPT-4 to generate instruction-following data for Large Language Models (LLMs) finetuning. The 52K English and Chinese instruction-following data generated by GPT-4 leads to superior zero-shot performance on new tasks compared to the instruction-following