Tree-of-Mixed-Thought: Combining Fast and Slow Thinking for Multi-hop Visual Reasoning

Pengbo Hu; Ji Qi; Xingyu Li; Hong Li; Xinqi Wang; Bing Quan; Ruiyu Wang; Yi Zhou

混合思考の木: マルチホップの視覚的推論のための高速思考と低速思考の組み合わせ

大規模言語モデル (LLM) を使用して、視覚的推論などの複雑な推論タスク用のコードのような計画を生成するという有望な傾向が現れています。 LLM ベースの計画として知られるこのパラダイムは、問題解決に柔軟性をもたらし、より優れた解釈可能性をもたらします。ただし、現在の研究はほとんどが、いくつかの推論ステップで簡単に回答できる単純な質問の基本的なシナリオに限定されています。より困難なマルチホップ視覚推論タスクの計画は、まだ検討されていません。具体的には、マルチホップ推論の状況では、プラン検索の精度と複雑さの間のトレードオフが顕著になります。一般的なアルゴリズムは、高速ワンストップ生成を採用することで効率の問題に対処するか、複雑な反復生成方法を採用して精度を向上させます。どちらも効率とパフォーマンスの必要性のバランスが取れていません。人間の脳の二重認知システム、高速思考プロセスと低速思考プロセスからインスピレーションを得て、ワンストップ推論 (高速) と思考ツリー (低速) を統合した階層型計画探索アルゴリズムを提案します。私たちのアプローチは、推論ステップを大幅に節約しながらパフォーマンスに成功します。さらに、PTR と CLEVER データセットを再利用し、さまざまな難易度の推論タスクの下で LLM ベースの計画検索アルゴリズムのパフォーマンスと効率を評価するための体系的なフレームワークを開発しました。広範な実験により、パフォーマンスと効率の点で私たちが提案したアルゴリズムの優位性が実証されました。データセットとコードは間もなくリリースされる予定です。

There emerges a promising trend of using large language models (LLMs) to generate code-like plans for complex inference tasks such as visual reasoning. This paradigm, known as LLM-based planning, provides flexibility in problem solving and endows better interpretability. However, current research is mostly limited to basic scenarios of simple questions that can be straightforward answered in a few inference steps. Planning for the more challenging multi-hop visual reasoning tasks remains under-explored. Specifically, under multi-hop reasoning situations, the trade-off between accuracy and the complexity of plan-searching becomes prominent. The prevailing algorithms either address the efficiency issue by employing the fast one-stop generation or adopt a complex iterative generation method to improve accuracy. Both fail to balance the need for efficiency and performance. Drawing inspiration from the dual system of cognition in the human brain, the fast and the slow think processes, we propose a hierarchical plan-searching algorithm that integrates the one-stop reasoning (fast) and the Tree-of-thought (slow). Our approach succeeds in performance while significantly saving inference steps. Moreover, we repurpose the PTR and the CLEVER datasets, developing a systematic framework for evaluating the performance and efficiency of LLMs-based plan-search algorithms under reasoning tasks at different levels of difficulty. Extensive experiments demonstrate the superiority of our proposed algorithm in terms of performance and efficiency. The dataset and code will be release soon.

updated: Mon Aug 21 2023 03:08:52 GMT+0000 (UTC)

published: Fri Aug 18 2023 16:21:40 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト