Towards Robust Visual Question Answering: Making the Most of Biased Samples via Contrastive Learning

Qingyi Si; Yuanxin Liu; Fandong Meng; Zheng Lin; Peng Fu; Yanan Cao; Weiping Wang; Jie Zhou

ロバストな視覚的質問応答に向けて: 対照学習による偏ったサンプルの最大限の活用

視覚的質問応答 (VQA) のモデルは、多くの場合、トレーニングセットの偏ったサンプルに現れる疑似相関、つまり言語事前確率に依存し、分布外 (OOD) テストデータに対して脆弱になります。最近の方法は、偏ったサンプルがモデルトレーニングに与える影響を減らすことで、この問題を克服する上で有望な進歩を遂げています。ただし、これらのモデルは、OOD データの改善が分布内 (ID) データ (偏ったサンプルによって支配される) のパフォーマンスを大幅に犠牲にするというトレードオフを明らかにしています。したがって、バイアスされたサンプルを最大限に活用して堅牢な VQA モデルを構築するための、新しい対照学習アプローチ MMBS を提案します。具体的には、元のトレーニングサンプルからスプリアス相関に関連する情報を削除することで、対照学習用のポジティブサンプルを構築し、構築されたポジティブサンプルをトレーニングに使用するためのいくつかの戦略を検討します。モデルトレーニングにおける偏ったサンプルの重要性を損なう代わりに、私たちのアプローチは偏ったサンプルを正確に利用して、推論に寄与する偏りのない情報を取得します。提案手法は、さまざまな VQA バックボーンと互換性があります。 ID データセット VQA v2 で堅牢なパフォーマンスを維持しながら、OOD データセット VQA-CP v2 で競争力のあるパフォーマンスを達成することにより、貢献を検証します。

Models for Visual Question Answering (VQA) often rely on the spurious correlations, i.e., the language priors, that appear in the biased samples of training set, which make them brittle against the out-of-distribution (OOD) test data. Recent methods have achieved promising progress in overcoming this problem by reducing the impact of biased samples on model training. However, these models reveal a trade-off that the improvements on OOD data severely sacrifice the performance on the in-distribution (ID) data (which is dominated by the biased samples). Therefore, we propose a novel contrastive learning approach, MMBS, for building robust VQA models by Making the Most of Biased Samples. Specifically, we construct positive samples for contrastive learning by eliminating the information related to spurious correlation from the original training samples and explore several strategies to use the constructed positive samples for training. Instead of undermining the importance of biased samples in model training, our approach precisely exploits the biased samples for unbiased information that contributes to reasoning. The proposed method is compatible with various VQA backbones. We validate our contributions by achieving competitive performance on the OOD dataset VQA-CP v2 while preserving robust performance on the ID dataset VQA v2.

updated: Mon Oct 10 2022 11:05:21 GMT+0000 (UTC)

published: Mon Oct 10 2022 11:05:21 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト