Improve the Interpretability of Attention: A Fast, Accurate, and Interpretable High-Resolution Attention Model

Tristan Gomez; Suiyi Ling; Thomas Fréour; Harold Mouchère

注意の解釈可能性の向上: 高速で正確で解釈可能な高解像度の注意モデル

注意のメカニズムを採用することの普及は、注意の分布の解釈可能性に関する懸念をもたらしました。モデルがどのように機能しているかについての洞察を提供しますが、モデル予測の説明として注意を利用することは依然として非常に疑わしいものです。コミュニティは、最終決定に最も貢献するローカルのアクティブな地域をより適切に特定するための、より解釈可能な戦略をまだ求めています。既存の注意モデルの解釈可能性を向上させるために、タスクに関連する人間が解釈できる情報をキャプチャする新しい双線形代表ノンパラメトリックアテンション (BR-NPA) 戦略を提案します。ターゲットモデルは、高解像度の中間特徴マップを得るために最初に抽出されます。そこから、代表的な特徴がローカルのペアごとの特徴の類似性に基づいてグループ化され、入力のタスク関連部分を強調する、よりきめ細かく、より正確なアテンションマップが生成されます。取得したアテンションマップは、複合機能の「アクティブレベル」に従ってランク付けされ、強調表示された領域の重要なレベルに関する情報を提供します。提案されたモデルは、分類が含まれるさまざまな最新の深層モデルに簡単に適応できます。また、通常の神経注意モジュールよりも正確で、高速で、メモリフットプリントが小さくなっています。大規模な実験により、最新の視覚化モデルと比較して、少数のショット分類、人物の再識別、きめの細かい画像分類などの複数のタスクにわたって、より包括的な視覚的説明が示されます。提案された視覚化モデルは、ニューラルネットワークがさまざまなタスクでどのように「注意を払う」かを明らかにするものです。

The prevalence of employing attention mechanisms has brought along concerns on the interpretability of attention distributions. Although it provides insights about how a model is operating, utilizing attention as the explanation of model predictions is still highly dubious. The community is still seeking more interpretable strategies for better identifying local active regions that contribute the most to the final decision. To improve the interpretability of existing attention models, we propose a novel Bilinear Representative Non-Parametric Attention (BR-NPA) strategy that captures the task-relevant human-interpretable information. The target model is first distilled to have higher-resolution intermediate feature maps. From which, representative features are then grouped based on local pairwise feature similarity, to produce finer-grained, more precise attention maps highlighting task-relevant parts of the input. The obtained attention maps are ranked according to the `active level' of the compound feature, which provides information regarding the important level of the highlighted regions. The proposed model can be easily adapted in a wide variety of modern deep models, where classification is involved. It is also more accurate, faster, and with a smaller memory footprint than usual neural attention modules. Extensive experiments showcase more comprehensive visual explanations compared to the state-of-the-art visualization model across multiple tasks including few-shot classification, person re-identification, fine-grained image classification. The proposed visualization model sheds imperative light on how neural networks `pay their attention' differently in different tasks.

updated: Mon Jun 07 2021 10:25:01 GMT+0000 (UTC)

published: Fri Jun 04 2021 15:57:37 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト