Sunny and Dark Outside?! Improving Answer Consistency in VQA through   Entailed Question Generation

Arijit Ray; Karan Sikka; Ajay Divakaran; Stefan Lee; Giedrius Burachas

外は晴れて暗い？！含意質問の生成によるVQAでの回答の一貫性の改善

Sunny and Dark Outside?! Improving Answer Consistency in VQA through Entailed Question Generation

Visual Question Answering（VQA）のモデルは長年にわたって着実に改善されてきましたが、モデルとの相互作用により、これらのモデルには一貫性がないことがすぐにわかります。たとえば、モデルが「バルーンは何色ですか」と「赤」と答えた場合、「バルーンは赤ですか？」と尋ねられた場合、「いいえ」と答えることがあります。これらの応答は、含意の単純な概念に違反し、VQAがいかに効果的に地言語をモデル化するかについて疑問を投げかけます。この作業では、VQAの一貫性の定量的評価を可能にするデータセット、ConVQA、およびメトリックを紹介します。画像内の特定の観察可能なファクト（バルーンの色など）について、論理的に一貫した質問と回答（QA）のペアのセット（バルーンが赤かどうかなど）を生成し、常識ベースの人間の注釈付きセットも収集します一貫したQAペア（例：バルーンはトマトソースと同じ色ですか？）さらに、一貫性を向上させるデータ拡張モジュールであるConsistency Teacher Module（CTM）を提案します。 CTMは、ソースQAペアの含意（または同様の意図）の質問を自動的に生成し、含意された質問に対するVQAの回答がソースQAペアと一致する場合、VQAモデルを微調整します。 CTMベースのトレーニングは、ConVQAデータセット上のVQAモデルの一貫性を改善し、さらなる研究の強力なベースラインであることを実証します。

While models for Visual Question Answering (VQA) have steadily improved over the years, interacting with one quickly reveals that these models lack consistency. For instance, if a model answers "red" to "What color is the balloon?", it might answer "no" if asked, "Is the balloon red?". These responses violate simple notions of entailment and raise questions about how effectively VQA models ground language. In this work, we introduce a dataset, ConVQA, and metrics that enable quantitative evaluation of consistency in VQA. For a given observable fact in an image (e.g. the balloon's color), we generate a set of logically consistent question-answer (QA) pairs (e.g. Is the balloon red?) and also collect a human-annotated set of common-sense based consistent QA pairs (e.g. Is the balloon the same color as tomato sauce?). Further, we propose a consistency-improving data augmentation module, a Consistency Teacher Module (CTM). CTM automatically generates entailed (or similar-intent) questions for a source QA pair and fine-tunes the VQA model if the VQA's answer to the entailed question is consistent with the source QA pair. We demonstrate that our CTM-based training improves the consistency of VQA models on the ConVQA datasets and is a strong baseline for further research.

updated: Tue Sep 10 2019 18:18:45 GMT+0000 (UTC)

published: Tue Sep 10 2019 18:18:45 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト