Generalizing Adversarial Explanations with Grad-CAM

Tanmay Chakraborty; Utkarsh Trehan; Khawla Mallat; Jean-Luc Dugelay

Grad-CAMで敵対的な説明を一般化する

勾配加重クラスアクティベーションマッピング（Grad-CAM）は、畳み込みニューラルネットワーク（CNN）モデルの説明として勾配アクティベーションヒートマップを提供する、例に基づく説明方法です。この方法の欠点は、CNNの動作を一般化するために使用できないことです。本論文では、Grad-CAMを例に基づく説明からグローバルモデルの振る舞いを説明する方法に拡張する新しい方法を提示します。これは、モデルの一般化のために、（i）平均観測非類似度（MOD）と（ii）非類似度の変動（VID）の2つの新しいメトリックを導入することによって実現されます。これらのメトリックは、元のテストセットのサンプルと敵対的なテストセットのサンプルについて、Grad-CAMで生成されたヒートマップの正規化された逆構造類似性インデックス（NISSIM）メトリックを比較することによって計算されます。私たちの実験では、VGG16、ResNet50、ResNet101などのディープモデル、およびFast Gradient Sign Method（FGSM）を使用したInceptionNetv3やXceptionNetなどのワイドモデルに対する敵対的攻撃を調査します。次に、VGGFace2データセットを使用した自動顔認識（AFR）ユースケースのメトリックMODおよびVIDを計算します。敵対的な攻撃を受けているすべてのモデルで、意思決定への参加を反映して、Grad-CAMヒートマップで強調表示されている領域に一貫した変化が見られます。提案された方法は、敵対的な攻撃を理解し、画像分析のためのブラックボックスCNNモデルの動作を説明するために使用できます。

Gradient-weighted Class Activation Mapping (Grad- CAM), is an example-based explanation method that provides a gradient activation heat map as an explanation for Convolution Neural Network (CNN) models. The drawback of this method is that it cannot be used to generalize CNN behaviour. In this paper, we present a novel method that extends Grad-CAM from example-based explanations to a method for explaining global model behaviour. This is achieved by introducing two new metrics, (i) Mean Observed Dissimilarity (MOD) and (ii) Variation in Dissimilarity (VID), for model generalization. These metrics are computed by comparing a Normalized Inverted Structural Similarity Index (NISSIM) metric of the Grad-CAM generated heatmap for samples from the original test set and samples from the adversarial test set. For our experiment, we study adversarial attacks on deep models such as VGG16, ResNet50, and ResNet101, and wide models such as InceptionNetv3 and XceptionNet using Fast Gradient Sign Method (FGSM). We then compute the metrics MOD and VID for the automatic face recognition (AFR) use case with the VGGFace2 dataset. We observe a consistent shift in the region highlighted in the Grad-CAM heatmap, reflecting its participation to the decision making, across all models under adversarial attacks. The proposed method can be used to understand adversarial attacks and explain the behaviour of black box CNN models for image analysis.

updated: Mon Apr 11 2022 22:09:21 GMT+0000 (UTC)

published: Mon Apr 11 2022 22:09:21 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト