LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models - Summary
This paper presents a method for compressing prompts in large language models (LLMs) to accelerate model inference and reduce cost. The method involves a budget controller, a token-level iterative compression algorithm, and an instruction tuning based method for distribution alignment. Experimental
Arxiv URL: https://arxiv.org/abs/2310.05736
Authors: Huiqiang Jiang, Qianhui Wu, Chin-Yew Lin, Yuqing Yang, Lili Qiu
Summary:
This paper presents a method for compressing prompts in large language models (LLMs) to accelerate model inference and reduce cost. The method involves a budget controller, a token-level iterative compression algorithm, and an instruction tuning-based method for distribution alignment. Experimental results show that the proposed approach achieves state-of-the-art performance with up to 20x compression.
Key Insights & Learnings:
- The paper introduces LLMLingua, a method for compressing prompts in large language models (LLMs) to accelerate model inference and reduce cost.
- LLMLingua involves a budget controller to allocate compression ratios, a token-level iterative compression algorithm, and an instruction tuning-based method for distribution alignment.
- Experimental results demonstrate that the proposed approach achieves state-of-the-art performance.
- The approach allows for up to 20x compression with little performance loss.
- The method is validated on four datasets from different domains, showing its effectiveness across various scenarios.
Key Advantages:
- Significant cost reduction in LLM inference
- Maintains semantic integrity of prompts
- Works with black-box LLMs (accessible via API only)
- No need for gradient flow through LLMs
- Compatible with various LLM applications
Terms Mentioned: large language models, LLMs, prompt compression, model inference, budget controller, token-level iterative compression, instruction tuning, distribution alignment, state-of-the-art performance, compression ratios, experimental results, datasets, domains
Technologies / Libraries Mentioned: Microsoft Corporation