SLiC-HF: Sequence Likelihood Calibration with Human Feedback - Summary
The paper presents a new approach called SLiC-HF that uses Sequence Likelihood Calibration with Human Feedback to improve language models. The approach is shown to be effective on the TL;DR summarization task and is a simpler and more computationally efficient alternative to Reinforcement Learning
Arxiv URL: https://arxiv.org/abs/2305.10425
Authors: Yao Zhao, Rishabh Joshi, Tianqi Liu, Misha Khalman, Mohammad Saleh, Peter J. Liu
Summary:
The paper presents a new approach called SLiC-HF that uses Sequence Likelihood Calibration with Human Feedback to improve language models. The approach is shown to be effective on the TL;DR summarization task and is a simpler and more computationally efficient alternative to Reinforcement Learning from Human Feedback (RLHF).
Key Insights & Learnings:
- SLiC-HF is a new approach that uses Sequence Likelihood Calibration with Human Feedback to improve language models.
- The approach is shown to be effective on the TL;DR summarization task.
- SLiC-HF is a simpler and more computationally efficient alternative to Reinforcement Learning from Human Feedback (RLHF).
- The paper demonstrates that SLiC-HF can be done with human feedback data collected for a different model, similar to off-policy, offline RL data.
- SLiC-HF presents a competitive alternative to the PPO RLHF implementation used in past work while being much simpler to implement, easier to tune and more computationally efficient in practice.
Terms Mentioned: Sequence Likelihood Calibration, Human Feedback, Reinforcement Learning, TL;DR Summarization Task, Supervised Fine-Tuning, ROUGE, Off-Policy, Calibration Loss, Cross-Entropy Loss, Ranking Model, Reward Model, T5 Models, Batch Size, Learning Rate, Perplexity, Beam-Search, Win Rate, Automatic Evaluation, Pointwise Rating
Technologies / Libraries Mentioned: Google Deepmind