Deep neural network loses attention to adversarial images

Shashank Kotyan; Danilo Vasconcellos Vargas

ディープニューラルネットワークは敵対的な画像への注意を失います

敵対的アルゴリズムは、さまざまなタスクのニューラルネットワークに対して効果的であることが示されています。一部の敵対的アルゴリズムは、画像分類の画像分類タスクのために、画像内のすべてのピクセルを最小限に混乱させます。対照的に、一部のアルゴリズムは数ピクセルを強く摂動させます。ただし、これらの敵対的なサンプルが互いに非常に多様である理由に関する情報はほとんどありません。最近、Vargas etal。これらの敵対的なサンプルの存在は、ニューラルネットワーク内の矛盾する顕著性が原因である可能性があることを示しました。競合する顕著性のこの仮説を、元のサンプルといくつかの異なるタイプの敵対サンプルの顕著性マップ（SM）と勾配加重クラスアクティベーションマップ（Grad-CAM）を分析することによってテストします。また、さまざまな敵対サンプルが元のサンプルと比較してニューラルネットワークの注意をどのように歪めるかを分析します。ピクセル攻撃の場合、摂動されたピクセルがネットワークの注意を自分自身に呼び出すか、注意をそれらからそらすことを示します。同時に、Projected Gradient Descent Attackはピクセルを混乱させるため、ニューラルネットワーク内の中間層は正しいクラスへの注意を失います。また、両方の攻撃が顕著性マップとアクティベーションマップに異なる影響を与えることも示しています。したがって、一部の攻撃に対して成功した一部の防御が他の攻撃に対して脆弱なままである理由に光を当てます。この分析により、敵対的なサンプルの存在と影響についての理解が深まり、コミュニティがより堅牢なニューラルネットワークを開発できるようになることを願っています。

Adversarial algorithms have shown to be effective against neural networks for a variety of tasks. Some adversarial algorithms perturb all the pixels in the image minimally for the image classification task in image classification. In contrast, some algorithms perturb few pixels strongly. However, very little information is available regarding why these adversarial samples so diverse from each other exist. Recently, Vargas et al. showed that the existence of these adversarial samples might be due to conflicting saliency within the neural network. We test this hypothesis of conflicting saliency by analysing the Saliency Maps (SM) and Gradient-weighted Class Activation Maps (Grad-CAM) of original and few different types of adversarial samples. We also analyse how different adversarial samples distort the attention of the neural network compared to original samples. We show that in the case of Pixel Attack, perturbed pixels either calls the network attention to themselves or divert the attention from them. Simultaneously, the Projected Gradient Descent Attack perturbs pixels so that intermediate layers inside the neural network lose attention for the correct class. We also show that both attacks affect the saliency map and activation maps differently. Thus, shedding light on why some defences successful against some attacks remain vulnerable against other attacks. We hope that this analysis will improve understanding of the existence and the effect of adversarial samples and enable the community to develop more robust neural networks.

updated: Thu Jun 10 2021 11:06:17 GMT+0000 (UTC)

published: Thu Jun 10 2021 11:06:17 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト