Contrast and Classify: Training Robust VQA Models

Yash Kant; Abhinav Moudgil; Dhruv Batra; Devi Parikh; Harsh Agrawal

対比と分類：堅牢なVQAモデルのトレーニング

最近の視覚的質問応答（VQA）モデルは、VQAベンチマークで印象的なパフォーマンスを示していますが、入力質問の小さな言語の変化に敏感なままです。既存のアプローチは、視覚的な質問生成モデルまたは敵対的な摂動からの質問の言い換えでデータセットを拡張することによってこれに対処します。これらのアプローチでは、結合されたデータを使用して、標準のクロスエントロピー損失を最小化することにより、回答分類子を学習します。拡張データをより効果的に活用するために、対照学習における最近の成功に基づいています。クロスエントロピーと対照的な損失の両方を最適化する新しいトレーニングパラダイム（ConClaT）を提案します。対照的な損失は、表現が質問の言語的変化に対してロバストであることを促進し、クロスエントロピー損失は、回答予測のための表現の識別力を維持します。両方の損失を（交互にまたは共同で）最適化することが、効果的なトレーニングの鍵であることがわかります。質問の人間の言い換え全体でVQAモデルの回答の一貫性を測定するVQA-Rephrasingsベンチマークでは、ConClaTは改善されたベースラインよりも1.63％コンセンサススコアを改善します。さらに、標準のVQA 2.0ベンチマークでは、VQAの精度が全体で0.78％向上しています。また、ConClaTが使用されるデータ拡張戦略のタイプにとらわれないことも示します。

Recent Visual Question Answering (VQA) models have shown impressive performance on the VQA benchmark but remain sensitive to small linguistic variations in input questions. Existing approaches address this by augmenting the dataset with question paraphrases from visual question generation models or adversarial perturbations. These approaches use the combined data to learn an answer classifier by minimizing the standard cross-entropy loss. To more effectively leverage augmented data, we build on the recent success in contrastive learning. We propose a novel training paradigm (ConClaT) that optimizes both cross-entropy and contrastive losses. The contrastive loss encourages representations to be robust to linguistic variations in questions while the cross-entropy loss preserves the discriminative power of representations for answer prediction. We find that optimizing both losses -- either alternately or jointly -- is key to effective training. On the VQA-Rephrasings benchmark, which measures the VQA model's answer consistency across human paraphrases of a question, ConClaT improves Consensus Score by 1 .63% over an improved baseline. In addition, on the standard VQA 2.0 benchmark, we improve the VQA accuracy by 0.78% overall. We also show that ConClaT is agnostic to the type of data-augmentation strategy used.

updated: Mon Apr 19 2021 03:45:27 GMT+0000 (UTC)

published: Tue Oct 13 2020 00:23:59 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト