Large Language Models Are Human-Level Prompt Engineers - Summary

The paper proposes Automatic Prompt Engineer (APE), an algorithm that generates and selects natural language instructions for large language models (LLMs) to improve task performance. APE treats the instruction as a program and optimizes it by searching over a pool of instruction candidates propose

Arxiv URL: https://arxiv.org/abs/2211.01910

Authors: Yongchao Zhou, Andrei Ioan Muresanu, Ziwen Han, Keiran Paster, Silviu Pitis, Harris Chan, Jimmy Ba

Summary:

The paper proposes Automatic Prompt Engineer (APE), an algorithm that generates and selects natural language instructions for large language models (LLMs) to improve task performance. APE treats the instruction as a program and optimizes it by searching over a pool of instruction candidates proposed by an LLM to maximize a chosen score function. The selected instruction is evaluated for zero-shot performance by another LLM. APE outperforms prior LLM baselines and achieves better or comparable performance to human-level prompt engineering on various tasks.

Key Insights & Learnings:

  • Automatic Prompt Engineering generates and selects natural language instructions for LLMs to improve task performance.
  • APE treats the instruction as a program and optimizes it by searching over a pool of instruction candidates proposed by an LLM.
  • The selected instruction is evaluated for zero-shot performance by another LLM.
  • APE outperforms prior LLM baselines and achieves better or comparable performance to human-generated instructions on various tasks.
  • APE can improve few-shot learning performance, find better zero-shot chain-of-thought prompts, and steer models toward truthfulness and/or informativeness.

Limitations:

  • Automatic Prompt Engineering (APE) requires additional computational resources
  • Cross-model instruction transfer is limited
  • Results depend on scoring function quality
  • May need task-specific optimization


Terms Mentioned: large language models, natural language program synthesis, black-box optimization, inference model, zero-shot learning, instruction induction, prompt engineering, few-shot learning, chain-of-thought prompts

Technologies / Libraries Mentioned: PyTorch, Hugging Face Transformers