paper summaries

LoRA: Low-Rank Adaptation of Large Language Models - Summary

The paper proposes Low-Rank Adaptation (LoRA) as an approach to reduce the number of trainable parameters for downstream tasks in natural language processing. LoRA injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable

Arxiv URL: https://arxiv.org/abs/2106.09685

Authors: Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen

Summary:

The paper proposes Low-Rank Adaptation (LoRA) as an approach to reduce the number of trainable parameters for downstream tasks in natural language processing. LoRA injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks. LoRA performs on-par or better than fine-tuning in model quality on RoBERTa, DeBERTa, GPT-2, and GPT-3, despite having fewer trainable parameters, a higher training throughput, and no additional inference latency.

Key Insights & Learnings:

LoRA reduces the number of trainable parameters for downstream tasks in natural language processing by injecting trainable rank decomposition matrices into each layer of the Transformer architecture.
LoRA performs on-par or better than fine-tuning in model quality on RoBERTa, DeBERTa, GPT-2, and GPT-3.
LoRA can reduce the number of trainable parameters by 10,000 times and the GPU memory requirement by 3 times compared to GPT-3 175B fine-tuned with Adam.
LoRA allows a pre-trained model to be shared and used to build many small LoRA modules for different tasks, reducing the storage requirement and task-switching overhead.
LoRA makes training more efficient and lowers the hardware barrier to entry by up to 3 times when using adaptive optimizers since we do not need to calculate the gradients or maintain the optimizer states for most parameters.

Terms Mentioned: Low-Rank Adaptation, Transformer architecture, fine-tuning, RoBERTa, DeBERTa, GPT-2, GPT-3, rank decomposition matrices, inference latency, language modeling

Technologies / Libraries Mentioned: PyTorch

LoRA: Low-Rank Adaptation of Large Language Models - Summary

Read next

⭐ Reducing LLM Costs & Latency with Semantic Cache

Instruction Tuning with GPT-4 - Summary

Are We Really Making Much Progress in Text Classification? A Comparative Review - Summary