SLiC-HF: Sequence Likelihood Calibration with Human Feedback - Summary

The paper presents a new approach called SLiC-HF that uses Sequence Likelihood Calibration with Human Feedback to improve language models. The approach is shown to be effective on the TL;DR summarization task and is a simpler and more computationally efficient alternative to Reinforcement Learning

Arxiv URL: https://arxiv.org/abs/2305.10425

Authors: Yao Zhao, Rishabh Joshi, Tianqi Liu, Misha Khalman, Mohammad Saleh, Peter J. Liu

Summary:

The paper presents a new approach called SLiC-HF that uses Sequence Likelihood Calibration with Human Feedback to improve language models. The approach is shown to be effective on the TL;DR summarization task and is a simpler and more computationally efficient alternative to Reinforcement Learning from Human Feedback (RLHF).

Key Insights & Learnings:

  • SLiC-HF is a new approach that uses Sequence Likelihood Calibration with Human Feedback to improve language models.
  • The approach is shown to be effective on the TL;DR summarization task.
  • SLiC-HF is a simpler and more computationally efficient alternative to Reinforcement Learning from Human Feedback (RLHF).
  • The paper demonstrates that SLiC-HF can be done with human feedback data collected for a different model, similar to off-policy, offline RL data.
  • SLiC-HF presents a competitive alternative to the PPO RLHF implementation used in past work while being much simpler to implement, easier to tune and more computationally efficient in practice.


Terms Mentioned: Sequence Likelihood Calibration, Human Feedback, Reinforcement Learning, TL;DR Summarization Task, Supervised Fine-Tuning, ROUGE, Off-Policy, Calibration Loss, Cross-Entropy Loss, Ranking Model, Reward Model, T5 Models, Batch Size, Learning Rate, Perplexity, Beam-Search, Win Rate, Automatic Evaluation, Pointwise Rating

Technologies / Libraries Mentioned: Google Deepmind

Subscribe to Portkey Blog

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
[email protected]
Subscribe