Question-Conditioned Counterfactual Image Generation for VQA

Jingjing Pan; Yash Goyal; Stefan Lee

VQAのための質問条件付き反事実画像生成

Visual Question Answering（VQA）モデルは最先端を前進させ続けていますが、ほとんどがブラックボックスのままであり、回答がどのように、またはなぜ生成されるかについての洞察を提供できません。この進行中の作業では、VQAモデルの反事実画像を生成することを学習することによりこの欠点に対処することを提案します。すなわち、質問と画像のペアが与えられた場合、i）VQAモデルが異なる回答を出力する新しい画像を生成したい、ii）新しい画像は元の画像とわずかに異なり、iii）新しい画像は現実的です。このような反事実的な例を提供することで、ユーザーがVQAモデルの内部メカニズムを調査して理解できるようになることを願っています。

While Visual Question Answering (VQA) models continue to push the state-of-the-art forward, they largely remain black-boxes - failing to provide insight into how or why an answer is generated. In this ongoing work, we propose addressing this shortcoming by learning to generate counterfactual images for a VQA model - i.e. given a question-image pair, we wish to generate a new image such that i) the VQA model outputs a different answer, ii) the new image is minimally different from the original, and iii) the new image is realistic. Our hope is that providing such counterfactual examples allows users to investigate and understand the VQA model's internal mechanisms.

updated: Thu Nov 14 2019 19:37:33 GMT+0000 (UTC)

published: Thu Nov 14 2019 19:37:33 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト