This paper presents a method for compressing prompts in large language models (LLMs) to accelerate model inference and reduce cost. The method involves a budget controller, a token-level iterative compression algorithm, and an instruction tuning based method for distribution alignment. Experimental
In this blog post, we explore a roadmap for building reliable large language model applications. Let’s get started!
The paper discusses the cost associated with querying large language models (LLMs) and proposes FrugalGPT, a framework that uses LLM APIs to process natural language queries within a budget constraint. The framework uses prompt adaptation, LLM approximation, and LLM cascade to reduce the inference
The paper proposes a multi-modal AI system named AudioGPT that complements Large Language Models (LLMs) with foundation models to process complex audio information and solve numerous understanding and generation tasks. AudioGPT is connected with an input/output interface (ASR, TTS) to support spoke
The paper introduces Chameleon, a plug-and-play compositional reasoning framework that augments large language models (LLMs) to address their inherent limitations and tackle a broad range of reasoning tasks. Chameleon synthesizes programs to compose various tools, including LLM models, off-the-shel
The paper presents the first attempt to use GPT-4 to generate instruction-following data for Large Language Models (LLMs) finetuning. The 52K English and Chinese instruction-following data generated by GPT-4 leads to superior zero-shot performance on new tasks compared to the instruction-following