CARETS: A Consistency And Robustness Evaluative Test Suite for VQA

Carlos E. Jimenez; Olga Russakovsky; Karthik Narasimhan

キャレット：VQAの一貫性と堅牢性の評価テストスイート

一連の6つのきめ細かい機能テストを通じて、最新のVQAモデルの一貫性と堅牢性を測定する体系的なテストスイートであるCARETSを紹介します。既存のVQAテストセットとは対照的に、CARETSは、モデルをテストするためのインスタンスのペアを作成するためのバランスの取れた質問生成を備えています。各ペアは、言い換え、論理的対称性、画像の難読化などの特定の機能に焦点を当てています。 CARETSで6つの最新のVQAシステムを評価し、モデル理解におけるいくつかの実用的な弱点を特定します。特に、否定、論理和、上位概念の不変性などの概念を使用します。興味深いことに、最も洗練されたモデルでさえ、接続詞の用語の順序を入れ替えたり、質問で言及されている回答の選択肢の数を変更したりするなどの側面に敏感です。マルチモーダルモデルの堅牢性を評価するための拡張可能なツールとして使用されるCARETSをリリースします。

We introduce CARETS, a systematic test suite to measure consistency and robustness of modern VQA models through a series of six fine-grained capability tests. In contrast to existing VQA test sets, CARETS features balanced question generation to create pairs of instances to test models, with each pair focusing on a specific capability such as rephrasing, logical symmetry or image obfuscation. We evaluate six modern VQA systems on CARETS and identify several actionable weaknesses in model comprehension, especially with concepts such as negation, disjunction, or hypernym invariance. Interestingly, even the most sophisticated models are sensitive to aspects such as swapping the order of terms in a conjunction or varying the number of answer choices mentioned in the question. We release CARETS to be used as an extensible tool for evaluating multi-modal model robustness.

updated: Tue Mar 15 2022 03:01:03 GMT+0000 (UTC)

published: Tue Mar 15 2022 03:01:03 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト