Non-monotonic Logical Reasoning Guiding Deep Learning for Explainable   Visual Question Answering

Heather Riley; Mohan Sridharan

説明不可能な視覚的質問応答のための非単調論理推論誘導深層学習

Non-monotonic Logical Reasoning Guiding Deep Learning for Explainable Visual Question Answering

多くのパターン認識問題に対する最先端のアルゴリズムは、深いネットワークモデルに依存しています。これらのモデルのトレーニングには、大きなラベル付きデータセットとかなりの計算リソースが必要です。また、これらの学習したモデルの動作を理解することは難しく、一部の重要なアプリケーションでの使用を制限します。これらの制限に対処するために、私たちのアーキテクチャは認知システムの研究からインスピレーションを得て、常識的な論理的推論、帰納的学習、および深層学習の原則を統合しています。シーンおよび基礎となる分類問題に関する説明的な質問に答えるという文脈において、このアーキテクチャは、ディープネットワークを使用して画像から特徴を抽出し、クエリに対する回答を生成します。これらの深いネットワーク間には、不完全な常識領域の知識を備えた非単調な論理的推論と、決定木誘導のためのコンポーネントが組み込まれています。また、ドメインの状態を管理する以前は不明だった制約を使用して、段階的に学習および理由を学習します。シミュレーション画像と実世界画像のデータセットのコンテキストでアーキテクチャを評価し、シミュレーションロボットが計画の説明、計算、実行、および説明を提供しました。実験結果は、ディープネットワークの「エンドツーエンド」アーキテクチャと比較して、トレーニングデータセットが小さい場合の分類問題の精度が向上し、大きなデータセットと同等の精度が得られ、説明的な質問に対するより正確な回答が得られることを示しています。さらに、以前は未知だった制約を段階的に取得することにより、説明的な質問に答える能力が向上し、非単調な論理的推論を拡張して計画と診断をサポートし、シミュレーションロボットでの計画の計算と実行の信頼性と効率が向上します。

State of the art algorithms for many pattern recognition problems rely on deep network models. Training these models requires a large labeled dataset and considerable computational resources. Also, it is difficult to understand the working of these learned models, limiting their use in some critical applications. Towards addressing these limitations, our architecture draws inspiration from research in cognitive systems, and integrates the principles of commonsense logical reasoning, inductive learning, and deep learning. In the context of answering explanatory questions about scenes and the underlying classification problems, the architecture uses deep networks for extracting features from images and for generating answers to queries. Between these deep networks, it embeds components for non-monotonic logical reasoning with incomplete commonsense domain knowledge, and for decision tree induction. It also incrementally learns and reasons with previously unknown constraints governing the domain's states. We evaluated the architecture in the context of datasets of simulated and real-world images, and a simulated robot computing, executing, and providing explanatory descriptions of plans. Experimental results indicate that in comparison with an ``end to end'' architecture of deep networks, our architecture provides better accuracy on classification problems when the training dataset is small, comparable accuracy with larger datasets, and more accurate answers to explanatory questions. Furthermore, incremental acquisition of previously unknown constraints improves the ability to answer explanatory questions, and extending non-monotonic logical reasoning to support planning and diagnostics improves the reliability and efficiency of computing and executing plans on a simulated robot.

updated: Mon Sep 23 2019 23:34:32 GMT+0000 (UTC)

published: Mon Sep 23 2019 23:34:32 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト