Regularizing Attention Networks for Anomaly Detection in Visual Question Answering

Doyup Lee; Yeongjae Cheon; Wook-Shin Han

視覚的質問応答における異常検出のための注意ネットワークの正則化

実際のアプリケーションの安定性と信頼性のために、単峰型タスクにおけるDNNの堅牢性が評価されています。ただし、視覚的質問応答（VQA）モデルが実際の環境に配置された後のテスト時に遭遇する可能性がある異常な状況を検討する研究はほとんどありません。この研究では、最悪のシナリオ、最も頻繁なシナリオ、VQAモデルの現在の制限など、5つの異なる異常に対する最新のVQAモデルの堅牢性を評価します。単峰型タスクの結果とは異なり、VQAモデルの回答の最大信頼度では異常な入力を検出できず、外れ値露出などの出力のポストトレーニングはVQAモデルには効果がありません。したがって、入力画像と質問の間の推論の信頼性を使用し、単峰型タスクでの以前の方法よりもはるかに有望な結果を示す注意ベースの方法を提案します。さらに、注意ネットワークの最大エントロピー正則化は、VQAモデルの注意ベースの異常検出を大幅に改善できることを示します。シンプルさのおかげで、注意ベースの異常検出と正則化は、モデルにとらわれない方法であり、最先端のVQAモデルでさまざまなクロスモーダル注意に使用できます。結果は、VQAのクロスモーダルな注意が、VQAの精度だけでなく、さまざまな異常へのロバスト性も改善するために重要であることを示しています。

For stability and reliability of real-world applications, the robustness of DNNs in unimodal tasks has been evaluated. However, few studies consider abnormal situations that a visual question answering (VQA) model might encounter at test time after deployment in the real-world. In this study, we evaluate the robustness of state-of-the-art VQA models to five different anomalies, including worst-case scenarios, the most frequent scenarios, and the current limitation of VQA models. Different from the results in unimodal tasks, the maximum confidence of answers in VQA models cannot detect anomalous inputs, and post-training of the outputs, such as outlier exposure, is ineffective for VQA models. Thus, we propose an attention-based method, which uses confidence of reasoning between input images and questions and shows much more promising results than the previous methods in unimodal tasks. In addition, we show that a maximum entropy regularization of attention networks can significantly improve the attention-based anomaly detection of the VQA models. Thanks to the simplicity, attention-based anomaly detection and the regularization are model-agnostic methods, which can be used for various cross-modal attentions in the state-of-the-art VQA models. The results imply that cross-modal attention in VQA is important to improve not only VQA accuracy, but also the robustness to various anomalies.

updated: Mon Mar 01 2021 12:46:47 GMT+0000 (UTC)

published: Mon Sep 21 2020 17:47:49 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト