Training language models to follow instructions with human feedback - Summary
The paper presents a method for aligning language models with user intent by fine-tuning with human feedback. The resulting models, called InstructGPT, show improvements in truthfulness and reductions in toxic output generation while having minimal performance regressions on public NLP datasets. Th