The paper proposes a prompt ensembling method for large language models called 'boosted prompting', which uses a small dataset to construct a set of few shot prompts that together comprise a boosted prompt ensemble. The few shot examples for each prompt are chosen in a stepwise fashion to be 'hard'
The paper explores the use of language models (LMs) to automatically generate evaluations for testing LM behaviors. The generated evaluations are diverse and of high quality, and the approach is significantly cheaper, lower effort, and faster than manual data creation. The paper discovers new cases
The paper proposes Automatic Prompt Engineer (APE), an algorithm that generates and selects natural language instructions for large language models (LLMs) to improve task performance. APE treats the instruction as a program and optimizes it by searching over a pool of instruction candidates propose
The paper introduces AUTOPROMPT, an automated method to create prompts for a diverse set of tasks based on a gradient-guided search. The prompts elicit more accurate factual knowledge from masked language models (MLMs) than manually created prompts on the LAMA benchmark. MLMs can perform sentiment