Portkey Blog Portkey Blog
  • Home
  • Production Guides
  • New Releases
  • Talks
  • Upcoming Events
  • Portkey Docs
Sign in Subscribe

PPO algorithm

Training language models to follow instructions with human feedback - Summary

The paper presents a method for aligning language models with user intent by fine-tuning with human feedback. The resulting models, called InstructGPT, show improvements in truthfulness and reductions in toxic output generation while having minimal performance regressions on public NLP datasets. Th
Rohit Agarwal 15 Apr 2023

Subscribe to Portkey Blog

  • Blog Home
  • Portkey Website
Portkey Blog © 2026. Powered by Ghost