Reassessing Evaluation Practices in Visual Question Answering: A Case Study on Out-of-Distribution Generalization

Aishwarya Agrawal; Ivana Kajić; Emanuele Bugliarello; Elnaz Davoodi; Anita Gergely; Phil Blunsom; Aida Nematzadeh

視覚的質問応答における評価実践の再評価: 分布外一般化のケーススタディ

大規模なマルチモーダルデータで事前トレーニングされた視覚と言語 (V&L) モデルは、画像キャプションや視覚的質問応答 (VQA) などのさまざまなタスクで優れたパフォーマンスを示しています。このようなモデルの品質は、通常、トレーニングデータと同じ分布に由来する目に見えないデータでパフォーマンスを測定することによって評価されます。ただし、VQA の分布外 (データセット外) 設定で評価すると、これらのモデルの一般化が不十分であることがわかります。クロスデータセット評価を実施することにより、異なる設定 (つまり、分類とオープンエンドのテキスト生成) で 2 つの事前トレーニング済みの V&L モデルを包括的に評価します。これらのモデルは、VQA タスクに必要な高レベルのスキルを学習するのではなく、ベンチマークを解決することを学習する傾向があることがわかりました。また、ほとんどの場合、生成モデルは識別モデルと比較してデータ分布の変化の影響を受けにくく、マルチモーダル事前トレーニングは一般に OOD の一般化に役立つこともわかりました。最後に、自動 VQA 評価指標の使用の根底にある仮定を再検討し、その厳格な性質が正しい応答に対してモデルに繰り返しペナルティを課すことを経験的に示します。

Vision-and-language (V&L) models pretrained on large-scale multimodal data have demonstrated strong performance on various tasks such as image captioning and visual question answering (VQA). The quality of such models is commonly assessed by measuring their performance on unseen data that typically comes from the same distribution as the training data. However, when evaluated under out-of-distribution (out-of-dataset) settings for VQA, we observe that these models exhibit poor generalization. We comprehensively evaluate two pretrained V&L models under different settings (i.e. classification and open-ended text generation) by conducting cross-dataset evaluations. We find that these models tend to learn to solve the benchmark, rather than learning the high-level skills required by the VQA task. We also find that in most cases generative models are less susceptible to shifts in data distribution compared to discriminative ones, and that multimodal pretraining is generally helpful for OOD generalization. Finally, we revisit assumptions underlying the use of automatic VQA evaluation metrics, and empirically show that their stringent nature repeatedly penalizes models for correct responses.

updated: Sat Apr 01 2023 07:07:44 GMT+0000 (UTC)

published: Tue May 24 2022 16:44:45 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト