Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models

Pan Lu; Baolin Peng; Hao Cheng; Michel Galley; Kai-Wei Chang; Ying Nian Wu; Song-Chun Zhu; Jianfeng Gao

Chameleon: 大規模な言語モデルを使用したプラグアンドプレイの構成推論

大規模言語モデル (LLM) は、さまざまな自然言語処理タスクにおいて、創発的な能力で目覚ましい進歩を遂げています。ただし、最新の情報にアクセスできない、外部ツールを利用できない、正確な数学的推論を実行できないなど、固有の制限に直面しています。このホワイトペーパーでは、これらの課題に対処するために LLM を拡張するプラグアンドプレイの合成推論フレームワークである Chameleon を紹介します。 Chameleon はプログラムを合成して、LLM モデル、既製のビジョンモデル、Web 検索エンジン、Python 関数、ユーザーの関心に合わせたルールベースのモジュールなど、さまざまなツールを構成します。自然言語プランナーとして LLM の上に構築された Chameleon は、最終的な応答を生成するために作成および実行するツールの適切なシーケンスを推測します。 ScienceQA と TabMWP の 2 つのタスクに対する Chameleon の適応性と有効性を紹介します。特に、GPT-4 を使用した Chameleon は、ScienceQA で 86.54% の精度を達成し、公開されている最良の少数ショットモデルを 11.37% 大幅に改善しています。基礎となる LLM として GPT-4 を使用する Chameleon は、最先端のモデルよりも 17.8% の増加を達成し、TabMWP で 98.78% の全体的な精度につながります。さらなる研究は、GPT-4 をプランナーとして使用すると、ChatGPT のような他の LLM と比較して、より一貫性のある合理的なツール選択を示し、指示が与えられた場合に潜在的な制約を推測できることを示唆しています。

Large language models (LLMs) have achieved remarkable progress in various natural language processing tasks with emergent abilities. However, they face inherent limitations, such as an inability to access up-to-date information, utilize external tools, or perform precise mathematical reasoning. In this paper, we introduce Chameleon, a plug-and-play compositional reasoning framework that augments LLMs to help address these challenges. Chameleon synthesizes programs to compose various tools, including LLM models, off-the-shelf vision models, web search engines, Python functions, and rule-based modules tailored to user interests. Built on top of an LLM as a natural language planner, Chameleon infers the appropriate sequence of tools to compose and execute in order to generate a final response. We showcase the adaptability and effectiveness of Chameleon on two tasks: ScienceQA and TabMWP. Notably, Chameleon with GPT-4 achieves an 86.54% accuracy on ScienceQA, significantly improving upon the best published few-shot model by 11.37%; using GPT-4 as the underlying LLM, Chameleon achieves a 17.8% increase over the state-of-the-art model, leading to a 98.78% overall accuracy on TabMWP. Further studies suggest that using GPT-4 as a planner exhibits more consistent and rational tool selection and is able to infer potential constraints given the instructions, compared to other LLMs like ChatGPT.

updated: Wed Apr 19 2023 17:47:47 GMT+0000 (UTC)

published: Wed Apr 19 2023 17:47:47 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト