Are VQA Systems RAD? Measuring Robustness to Augmented Data with Focused Interventions

Daniel Rosenberg; Itai Gat; Amir Feder; Roi Reichart

VQA システムは RAD ですか?焦点を絞った介入による拡張データに対するロバスト性の測定

深層学習アルゴリズムは、視覚的質問応答 (VQA) タスクで有望な結果を示していますが、よく見ると、与えられている豊富な信号を理解していないことがよくあります。 VQA システムの一般化機能を理解し、より適切に測定するために、実際に増強されたデータに対する堅牢性を調べます。私たちが提案する拡張は、答えが変わるように、質問の特定の特性に焦点を合わせた介入を行うように設計されています。これらの拡張を使用して、新しい堅牢性の尺度である拡張データに対する堅牢性 (RAD) を提案します。これは、元の例と拡張された例の間のモデル予測の一貫性を測定します。広範な実験を通じて、RAD は、従来の精度測定とは異なり、最先端のシステムが反事実に対して堅牢でない場合を定量化できることを示しています。現在の VQA システムがまだ脆弱であることを示す、かなりの失敗例が見つかりました。最後に、堅牢性と一般化を結び付け、目に見えない増強のパフォーマンスに対する RAD の予測力を示します。

Deep learning algorithms have shown promising results in visual question answering (VQA) tasks, but a more careful look reveals that they often do not understand the rich signal they are being fed with. To understand and better measure the generalization capabilities of VQA systems, we look at their robustness to counterfactually augmented data. Our proposed augmentations are designed to make a focused intervention on a specific property of the question such that the answer changes. Using these augmentations, we propose a new robustness measure, Robustness to Augmented Data (RAD), which measures the consistency of model predictions between original and augmented examples. Through extensive experimentation, we show that RAD, unlike classical accuracy measures, can quantify when state-of-the-art systems are not robust to counterfactuals. We find substantial failure cases which reveal that current VQA systems are still brittle. Finally, we connect between robustness and generalization, demonstrating the predictive power of RAD for performance on unseen augmentations.

updated: Tue Jun 08 2021 16:09:47 GMT+0000 (UTC)

published: Tue Jun 08 2021 16:09:47 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト