paper summaries

LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models - Summary

This paper presents a method for compressing prompts in large language models (LLMs) to accelerate model inference and reduce cost. The method involves a budget controller, a token-level iterative compression algorithm, and an instruction tuning based method for distribution alignment. Experimental

Arxiv URL: https://arxiv.org/abs/2310.05736

Authors: Huiqiang Jiang, Qianhui Wu, Chin-Yew Lin, Yuqing Yang, Lili Qiu

Summary:

This paper presents a method for compressing prompts in large language models (LLMs) to accelerate model inference and reduce cost. The method involves a budget controller, a token-level iterative compression algorithm, and an instruction tuning-based method for distribution alignment. Experimental results show that the proposed approach achieves state-of-the-art performance with up to 20x compression.

Key Insights & Learnings:

The paper introduces LLMLingua, a method for compressing prompts in large language models (LLMs) to accelerate model inference and reduce cost.
LLMLingua involves a budget controller to allocate compression ratios, a token-level iterative compression algorithm, and an instruction tuning-based method for distribution alignment.
Experimental results demonstrate that the proposed approach achieves state-of-the-art performance.
The approach allows for up to 20x compression with little performance loss.
The method is validated on four datasets from different domains, showing its effectiveness across various scenarios.

Key Advantages:

Significant cost reduction in LLM inference
Maintains semantic integrity of prompts
Works with black-box LLMs (accessible via API only)
No need for gradient flow through LLMs
Compatible with various LLM applications

Terms Mentioned: large language models, LLMs, prompt compression, model inference, budget controller, token-level iterative compression, instruction tuning, distribution alignment, state-of-the-art performance, compression ratios, experimental results, datasets, domains

Technologies / Libraries Mentioned: Microsoft Corporation

Instruction Tuning with GPT-4 - Summary

The paper presents the first attempt to use GPT-4 to generate instruction-following data for Large Language Models (LLMs) finetuning. The 52K English and Chinese instruction-following data generated by GPT-4 leads to superior zero-shot performance on new tasks compared to the instruction-following

Are We Really Making Much Progress in Text Classification? A Comparative Review - Summary

This paper reviews and compares methods for single-label and multi-label text classification, categorizing them into bag-of-words, sequence-based, graph-based, and hierarchical methods. The findings reveal that pre-trained language models outperform all recently proposed graph-based and hierarchy-b

A Survey of Large Language Models - Summary

This paper surveys the recent advances in Large Language Models (LLMs), which are pre-trained Transformer models over large-scale corpora. The paper discusses the background, key findings, and mainstream techniques of LLMs, focusing on pre-training, adaptation tuning, utilization, and capacity eval

Read next