CRIC: A VQA Dataset for Compositional Reasoning on Vision and Commonsense

Difei Gao; Ruiping Wang; Shiguang Shan; Xilin Chen

CRIC：ビジョンと常識に関する構成的推論のためのVQAデータセット

あるいは、視覚的な事実と常識を推測することは、高度なVQAシステムの基本です。この能力は、モデルが常識の文字通りの理解を超えることを必要とします。システムは、オブジェクトを背景知識を照会するための入り口として扱うだけでなく、視覚世界への常識を完全に理解し、「フォーク、持ち上げることができる、食べ物」などのオブジェクト間の可能な関係を想像する必要があります。このような能力を包括的に評価するために、VQAベンチマークであるCRICを提案します。これは、vIsionと常識に関する構成的推論に関する新しいタイプの質問と、回答の正しさと常識的な根拠を統合した評価指標を紹介します。このような質問と豊富な追加の注釈を収集してメトリックをサポートするために、画像に関連付けられたシーングラフと関連する知識グラフから質問サンプルを生成する自動アルゴリズムも提案します。さらに、CRICデータセットのVQAモデルのいくつかの代表的なタイプを分析します。実験結果は、常識を画像領域に根付かせること、および視覚と常識に関する共同推論が現在のアプローチにとって依然として挑戦的であることを示しています。データセットはhttps://cricvqa.github.ioで入手できます。

Alternatively inferring on the visual facts and commonsense is fundamental for an advanced VQA system. This ability requires models to go beyond the literal understanding of commonsense. The system should not just treat objects as the entrance to query background knowledge, but fully ground commonsense to the visual world and imagine the possible relationships between objects, e.g., "fork, can lift, food". To comprehensively evaluate such abilities, we propose a VQA benchmark, CRIC, which introduces new types of questions about Compositional Reasoning on vIsion and Commonsense, and an evaluation metric integrating the correctness of answering and commonsense grounding. To collect such questions and rich additional annotations to support the metric, we also propose an automatic algorithm to generate question samples from the scene graph associated with the images and the relevant knowledge graph. We further analyze several representative types of VQA models on the CRIC dataset. Experimental results show that grounding the commonsense to the image region and joint reasoning on vision and commonsense are still challenging for current approaches. The dataset is available at https://cricvqa.github.io.

updated: Wed Oct 27 2021 02:22:47 GMT+0000 (UTC)

published: Thu Aug 08 2019 08:07:35 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト