paper summaries

Boosted Prompt Ensembles for Large Language Models -Summary

The paper proposes a prompt ensembling method for large language models called 'boosted prompting', which uses a small dataset to construct a set of few shot prompts that together comprise a boosted prompt ensemble. The few shot examples for each prompt are chosen in a stepwise fashion to be 'hard'

Arxiv URL: https://arxiv.org/abs/2304.05970

Authors: Silviu Pitis, Michael R. Zhang, Andrew Wang, Jimmy Ba

What is Boosted Prompt Ensembling?

The paper proposes a prompt ensembling method for large language models called 'boosted prompting'. This method uses a small dataset to construct a set of few shot prompts that together comprise a boosted prompt ensemble. The few short examples for each prompt are chosen in a stepwise fashion to be 'hard' examples on which the previous step's ensemble is uncertain. The algorithm outperforms single-prompt output-space ensembles and bagged prompt-space ensembles on challenging datasets like GSM8k and AQuA.

Boost prompt ensembles for LLMs

What are the key advantages?

Improves performance over single-prompt and bagged ensembles
Works with as few as 100 labeled examples
Requires minimal manual prompt engineering
Complements other reasoning techniques like chain-of-thought
Can potentially adapt to distribution shifts in test-time variant
Achieves strong results on multiple reasoning benchmarks

What are the limitations/disadvantages?

Requires at least some labeled data for best performance
Test-time version generally performs worse than the train-time version
Performance depends on the quality of the initial prompt
It may not help weaker language models (e.g., didn't improve Curie's performance)
Computationally more expensive than single prompts
Requires sufficient model accuracy to generate reliable self-supervision

What datasets was it tested on?

AQUA (algebra questions)
GSM8K (grade school math)
MMLU570 (multi-task understanding)
CMATH420 (competition math)
SVAMP (arithmetic word problems)

What are the key insights & learnings?

Boosted prompting outperforms single-prompt output-space ensembles and bagged prompt-space ensembles on challenging datasets like GSM8k and AQuA.
The algorithm uses a small dataset to construct a set of a few shot prompts that together comprise a boosted prompt ensemble.
The few shot examples for each prompt are chosen in a stepwise fashion to be 'hard' examples on which the previous step's ensemble is uncertain.
The algorithm proposes both train-time and test-time versions of boosted prompting that use different levels of available annotation.
The algorithm leverages a small dataset to construct a set of few shot prompts that progressively solve more of the problems, inspired by classical boosting algorithms.

Terms Mentioned: Large Language Models, few shot prompts, boosted prompting, output-space ensembles, bagged prompt-space ensembles, GSM8k, AQuA, train-time boosting, test-time boosting, annotation

Technologies / Libraries Mentioned: PyTorch, Hugging Face Transformers

Boosted Prompt Ensembles for Large Language Models -Summary

Arxiv URL: https://arxiv.org/abs/2304.05970

Authors: Silviu Pitis, Michael R. Zhang, Andrew Wang, Jimmy Ba

What is Boosted Prompt Ensembling?

What are the key advantages?

What are the limitations/disadvantages?

What datasets was it tested on?

What are the key insights & learnings?

Read next

Instruction Tuning with GPT-4 - Summary

Are We Really Making Much Progress in Text Classification? A Comparative Review - Summary

A Survey of Large Language Models - Summary

Arxiv URL: https://arxiv.org/abs/2304.05970Authors: Silviu Pitis, Michael R. Zhang, Andrew Wang, Jimmy BaWhat is Boosted Prompt Ensembling?

What are the key advantages?

What are the limitations/disadvantages?

What datasets was it tested on?

What are the key insights & learnings?

Read next

Arxiv URL: https://arxiv.org/abs/2304.05970

Authors: Silviu Pitis, Michael R. Zhang, Andrew Wang, Jimmy Ba

What is Boosted Prompt Ensembling?