Identifying Layers Susceptible to Adversarial Attacks

Shoaib Ahmed Siddiqui; Thomas Breuel

敵対的な攻撃を受けやすいレイヤーの特定

Identifying Layers Susceptible to Adversarial Attacks

一般的なニューラルネットワークアーキテクチャは、敵対的サンプルによる攻撃を受けやすいです。ニューラルネットワークアーキテクチャは、一般に、低レベルの特徴抽出レイヤーと高レベルの分類レイヤーに分けられると考えられています。敵対的なサンプルに対するネットワークの感受性は、特徴抽出ではなく分類に関連する問題と見なされることがよくあります。非敵対的および敵対的データを使用して、CIFAR-10、Imagenette、およびImageNet上のVGGおよびResNetアーキテクチャのさまざまな部分を選択的に再トレーニングすることにより、このアイデアをテストします。私たちの実験結果は、敵対的なサンプルに対する感受性が低レベルの特徴抽出層に関連していることを示しています。したがって、堅牢性を実現するには、高レベルのレイヤーを再トレーニングするだけでは不十分です。この現象には2つの説明があります。敵対的攻撃は、攻撃クラスで見つかった機能と区別できない初期層からの出力を生成するか、敵対的攻撃は、非敵対的サンプルの機能と統計的に異なり、一貫性を許可しない初期層からの出力を生成します。後続のレイヤーによる分類。隠れ層の特徴ベクトルの分布に関する大規模な非線形次元削減と密度モデリングによってこの質問をテストし、非敵対的サンプルと敵対的サンプルの間の特徴分布が大幅に異なることを発見しました。私たちの結果は、敵対的なサンプルの統計的起源と可能な防御への新しい洞察を提供します。

Common neural network architectures are susceptible to attack by adversarial samples. Neural network architectures are commonly thought of as divided into low-level feature extraction layers and high-level classification layers; susceptibility of networks to adversarial samples is often thought of as a problem related to classification rather than feature extraction. We test this idea by selectively retraining different portions of VGG and ResNet architectures on CIFAR-10, Imagenette and ImageNet using non-adversarial and adversarial data. Our experimental results show that susceptibility to adversarial samples is associated with low-level feature extraction layers. Therefore, retraining high-level layers is insufficient for achieving robustness. This phenomenon could have two explanations: either, adversarial attacks yield outputs from early layers that are indistinguishable from features found in the attack classes, or adversarial attacks yield outputs from early layers that differ statistically from features for non-adversarial samples and do not permit consistent classification by subsequent layers. We test this question by large-scale non-linear dimensionality reduction and density modeling on distributions of feature vectors in hidden layers and find that the feature distributions between non-adversarial and adversarial samples differ substantially. Our results provide new insights into the statistical origins of adversarial samples and possible defenses.

updated: Sat Jul 10 2021 12:38:49 GMT+0000 (UTC)

published: Sat Jul 10 2021 12:38:49 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト