Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes - Summary
The paper introduces a new mechanism called Distilling step-by-step that trains smaller models to outperform larger language models (LLMs) while using less training data and smaller model sizes. The mechanism extracts LLM rationales as additional supervision for small models within a multi-task tra