Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models - Summary

The paper introduces Chameleon, a plug-and-play compositional reasoning framework that augments large language models (LLMs) to address their inherent limitations and tackle a broad range of reasoning tasks. Chameleon synthesizes programs to compose various tools, including LLM models, off-the-shel

Arxiv URL: https://arxiv.org/abs/2304.09842v1

Authors: Pan Lu, Baolin Peng, Hao Cheng, Michel Galley, Kai-Wei Chang, Ying Nian Wu, Song-Chun Zhu, Jianfeng Gao

Summary:

The paper introduces Chameleon, a plug-and-play compositional reasoning framework that augments large language models (LLMs) to address their inherent limitations and tackle a broad range of reasoning tasks. Chameleon synthesizes programs to compose various tools, including LLM models, off-the-shelf vision models, web search engines, Python functions, and rule-based modules tailored to user interests. The framework is showcased on two tasks: ScienceQA and TabMWP, where it significantly improves the state-of-the-art accuracy.

Key Insights & Learnings:

  • Chameleon is a plug-and-play compositional reasoning framework that augments LLMs to address their inherent limitations and tackle a broad range of reasoning tasks.
  • Chameleon synthesizes programs to compose various tools, including LLM models, off-the-shelf vision models, web search engines, Python functions, and rule-based modules tailored to user interests.
  • Chameleon is built on top of an LLM as a natural language planner, which infers the appropriate sequence of tools to compose and execute in order to generate a final response.
  • Chameleon achieves an 86.54% accuracy on ScienceQA, significantly improving upon the best published few-shot model by 11.37%; using GPT-4 as the underlying LLM, Chameleon achieves a 17.8% increase over the state-of-the-art model, leading to a 98.78% overall accuracy on TabMWP.
  • Using GPT-4 as a planner exhibits more consistent and rational tool selection and is able to infer potential constraints given the instructions, compared to other LLMs like ChatGPT.


Terms Mentioned: large language models, Chameleon, compositional reasoning, LLMs, ScienceQA, TabMWP, GPT-3, GPT-4, natural language processing, mathematical reasoning, commonsense reasoning, external tools, plug-and-play modular approaches, web search engines, Python functions, rule-based modules, computer vision models, in-context learning, zero-shot settings, human feedback

Technologies / Libraries Mentioned: Microsoft Research