paper summaries

SLiC-HF: Sequence Likelihood Calibration with Human Feedback - Summary

The paper presents a new approach called SLiC-HF that uses Sequence Likelihood Calibration with Human Feedback to improve language models. The approach is shown to be effective on the TL;DR summarization task and is a simpler and more computationally efficient alternative to Reinforcement Learning

Arxiv URL: https://arxiv.org/abs/2305.10425

Authors: Yao Zhao, Rishabh Joshi, Tianqi Liu, Misha Khalman, Mohammad Saleh, Peter J. Liu

Summary:

The paper presents a new approach called SLiC-HF that uses Sequence Likelihood Calibration with Human Feedback to improve language models. The approach is shown to be effective on the TL;DR summarization task and is a simpler and more computationally efficient alternative to Reinforcement Learning from Human Feedback (RLHF).

Key Insights & Learnings:

SLiC-HF is a new approach that uses Sequence Likelihood Calibration with Human Feedback to improve language models.
The approach is shown to be effective on the TL;DR summarization task.
SLiC-HF is a simpler and more computationally efficient alternative to Reinforcement Learning from Human Feedback (RLHF).
The paper demonstrates that SLiC-HF can be done with human feedback data collected for a different model, similar to off-policy, ofﬂine RL data.
SLiC-HF presents a competitive alternative to the PPO RLHF implementation used in past work while being much simpler to implement, easier to tune and more computationally efficient in practice.

Terms Mentioned: Sequence Likelihood Calibration, Human Feedback, Reinforcement Learning, TL;DR Summarization Task, Supervised Fine-Tuning, ROUGE, Off-Policy, Calibration Loss, Cross-Entropy Loss, Ranking Model, Reward Model, T5 Models, Batch Size, Learning Rate, Perplexity, Beam-Search, Win Rate, Automatic Evaluation, Pointwise Rating

Technologies / Libraries Mentioned: Google Deepmind

Open Sourcing Guardrails on the Gateway Framework

We are solving the *biggest missing component* in taking AI apps to prod → Now, enforce LLM behavior and route requests with precision, in one go.

Instruction Tuning with GPT-4 - Summary

The paper presents the first attempt to use GPT-4 to generate instruction-following data for Large Language Models (LLMs) finetuning. The 52K English and Chinese instruction-following data generated by GPT-4 leads to superior zero-shot performance on new tasks compared to the instruction-following

Are We Really Making Much Progress in Text Classification? A Comparative Review - Summary

This paper reviews and compares methods for single-label and multi-label text classification, categorizing them into bag-of-words, sequence-based, graph-based, and hierarchical methods. The findings reveal that pre-trained language models outperform all recently proposed graph-based and hierarchy-b

Read next