REX: Reasoning-aware and Grounded Explanation

Shi Chen; Qi Zhao

REX：推論を認識し、根拠のある説明

有効性と解釈可能性は、信頼できるAIシステムにとって2つの重要な特性です。視覚的推論に関する最近の研究は、予測された回答の精度を向上させることに専念しており、決定の背後にある理論的根拠を説明することにあまり注意が払われていません。その結果、彼らは通常、視覚的テキストデータを実際に推論するのではなく、偽のバイアスを利用しますが、両方のモダリティからの重要な情報を考慮して意思決定を説明する機能をまだ開発していません。このペーパーは、3つの異なる観点からギャップを埋めることを目的としています。まず、推論プロセスを段階的にトラバースし、画像内のキーワードを接地することによって、決定を説明する新しいタイプのマルチモーダル説明を定義します。さまざまな推論ステップを順番に実行し、1,040,830のマルチモーダル説明を含む新しいデータセットを構築する関数型プログラムを開発します。次に、決定を説明するために視覚的およびテキスト的モダリティ全体で重要なコンポーネントを緊密に結合する重要な必要性を特定し、単語と関心領域の間のペアワイズ対応を明示的にモデル化する新しい説明生成方法を提案します。これにより、視覚的な接地機能が大幅に向上し、解釈可能性と推論のパフォーマンスが向上します。最後に、新しいデータと方法を使用して、マルチタスク学習や転移学習など、さまざまな設定での説明の有効性を調査するために広範な分析を実行します。コードとデータはhttps://github.com/szzexpoi/rexで入手できます。

Effectiveness and interpretability are two essential properties for trustworthy AI systems. Most recent studies in visual reasoning are dedicated to improving the accuracy of predicted answers, and less attention is paid to explaining the rationales behind the decisions. As a result, they commonly take advantage of spurious biases instead of actually reasoning on the visual-textual data, and have yet developed the capability to explain their decision making by considering key information from both modalities. This paper aims to close the gap from three distinct perspectives: first, we define a new type of multi-modal explanations that explain the decisions by progressively traversing the reasoning process and grounding keywords in the images. We develop a functional program to sequentially execute different reasoning steps and construct a new dataset with 1,040,830 multi-modal explanations. Second, we identify the critical need to tightly couple important components across the visual and textual modalities for explaining the decisions, and propose a novel explanation generation method that explicitly models the pairwise correspondence between words and regions of interest. It improves the visual grounding capability by a considerable margin, resulting in enhanced interpretability and reasoning performance. Finally, with our new data and method, we perform extensive analyses to study the effectiveness of our explanation under different settings, including multi-task learning and transfer learning. Our code and data are available at https://github.com/szzexpoi/rex.

updated: Fri Mar 11 2022 17:28:42 GMT+0000 (UTC)

published: Fri Mar 11 2022 17:28:42 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト