Assessing the Robustness of Visual Question Answering Models

Jia-Hong Huang; Modar Alfadly; Bernard Ghanem; Marcel Worring

視覚的な質問応答モデルの堅牢性の評価

ディープニューラルネットワークは、視覚的質問応答（VQA）のタスクで重要な役割を果たしてきました。最近まで、それらの精度が研究の主な焦点でした。現在、VQAモデルの入力のノイズのレベルが増加する中で、これらのモデルの精度を評価することにより、敵対的攻撃に対するこれらのモデルの堅牢性を評価する傾向があります。 VQAでは、攻撃は画像や提案されたクエリ質問、吹き替えのメイン質問を標的にする可能性がありますが、VQAのこの側面の適切な分析が不足しています。この作業では、VQAモデルの堅牢性を評価するためのノイズとして機能する、意味的に関連する質問、吹き替えの基本的な質問を使用する新しい方法を提案します。基本的な質問と主要な質問の類似性が低下するにつれて、ノイズのレベルが増加すると仮定します。特定のメイン質問に対して妥当なノイズレベルを生成するために、このメイン質問との類似性に基づいて基本的な質問のプールをランク付けします。このランキング問題をLASSO最適化問題としてキャストします。また、VQAモデルのロバストネス分析を標準化するために、新しいロバストネス測定Rscoreと2つの大規模な基本的な質問データセットを提案します。実験結果は、提案された評価方法がVQAモデルのロバスト性を効果的に分析できることを示しています。 VQA研究を促進するために、提案されたデータセットを公開します。

Deep neural networks have been playing an essential role in the task of Visual Question Answering (VQA). Until recently, their accuracy has been the main focus of research. Now there is a trend toward assessing the robustness of these models against adversarial attacks by evaluating the accuracy of these models under increasing levels of noisiness in the inputs of VQA models. In VQA, the attack can target the image and/or the proposed query question, dubbed main question, and yet there is a lack of proper analysis of this aspect of VQA. In this work, we propose a new method that uses semantically related questions, dubbed basic questions, acting as noise to evaluate the robustness of VQA models. We hypothesize that as the similarity of a basic question to the main question decreases, the level of noise increases. To generate a reasonable noise level for a given main question, we rank a pool of basic questions based on their similarity with this main question. We cast this ranking problem as a LASSO optimization problem. We also propose a novel robustness measure Rscore and two large-scale basic question datasets in order to standardize robustness analysis of VQA models. The experimental results demonstrate that the proposed evaluation method is able to effectively analyze the robustness of VQA models. To foster the VQA research, we will publish our proposed datasets.

updated: Thu Mar 03 2022 14:17:46 GMT+0000 (UTC)

published: Sat Nov 30 2019 09:32:38 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト