RUBi: Reducing Unimodal Biases in Visual Question Answering

Remi Cadene; Corentin Dancette; Hedi Ben-younes; Matthieu Cord; Devi Parikh

RUBi：視覚的質問応答におけるユニモーダルバイアスの削減

Visual Question Answering（VQA）は、画像に関する質問に答えるタスクです。一部のVQAモデルでは、画像情報を使用せずにユニモーダルバイアスを利用して正しい答えを提供することがよくあります。その結果、トレーニングセットの分布以外のデータで評価した場合、パフォーマンスが大幅に低下します。この重大な問題により、実際の環境には適していません。 VQAモデルのバイアスを減らす新しい学習戦略であるRUBiを提案します。最も偏った例、つまり画像を見ずに正しく分類できる例の重要性を減らします。質問と回答の間の統計的な規則性に依存する代わりに、2つの入力モダリティを使用するようにVQAモデルに暗黙的に強制します。これらの不要な規則性がいつ使用されるかを特定することにより、言語の偏りを捉える質問のみのモデルを活用します。予測に影響を与えることにより、基本VQAモデルがそれらを学習するのを防ぎます。これは、バイアスを補正するために損失を動的に調整することにつながります。 VQA-CP v2の最新の結果を上回ることにより、貢献度を検証します。このデータセットは、テスト中にトレーニング中に見られたものとは異なる質問バイアスにさらされた場合のVQAモデルの堅牢性を評価するために特別に設計されています。私たちのコードが利用可能です：github.com/cdancette/rubi.bootstrap.pytorch

Visual Question Answering (VQA) is the task of answering questions about an image. Some VQA models often exploit unimodal biases to provide the correct answer without using the image information. As a result, they suffer from a huge drop in performance when evaluated on data outside their training set distribution. This critical issue makes them unsuitable for real-world settings. We propose RUBi, a new learning strategy to reduce biases in any VQA model. It reduces the importance of the most biased examples, i.e. examples that can be correctly classified without looking at the image. It implicitly forces the VQA model to use the two input modalities instead of relying on statistical regularities between the question and the answer. We leverage a question-only model that captures the language biases by identifying when these unwanted regularities are used. It prevents the base VQA model from learning them by influencing its predictions. This leads to dynamically adjusting the loss in order to compensate for biases. We validate our contributions by surpassing the current state-of-the-art results on VQA-CP v2. This dataset is specifically designed to assess the robustness of VQA models when exposed to different question biases at test time than what was seen during training. Our code is available: github.com/cdancette/rubi.bootstrap.pytorch

updated: Mon Mar 23 2020 11:25:27 GMT+0000 (UTC)

published: Mon Jun 24 2019 18:55:24 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト