Universalization of any adversarial attack using very few test examples

Sandesh Kamath; Amit Deshpande; K V Subrahmanyam; Vineeth N Balasubramanian

非常に少数のテスト例を使用した敵対的攻撃の普遍化

深層学習モデルは、入力依存の敵対的攻撃だけでなく、入力に依存しない、または普遍的な敵対的攻撃に対しても脆弱であることが知られています。 Dezfooli等。 Dezfooli17,Dezfooli17anal は、多数のトレーニングデータポイントとそれらの近くの決定境界のジオメトリを調べることにより、特定のモデルに対する普遍的な敵対的攻撃を構築します。その後の作業 Khrulkov18 は、与えられたモデルのテスト例と中間層のみを見て、普遍的な攻撃を構築しました。この論文では、入力依存の敵対的攻撃を取り、非常に少数の敵対的テスト例を見るだけで普遍的な攻撃を構築するための単純な普遍化手法を提案します。特定のモデルの詳細は必要なく、普遍化のための計算オーバーヘッドはごくわずかです。勾配、Fast Gradient Sign Method (FGSM)、DeepFool など、多くの入力依存の敵対的摂動に共通するスペクトル特性によって、普遍化手法を理論的に正当化します。マトリックス濃度不等式とスペクトル摂動限界を使用して、小さなテストサンプルでの入力依存の敵対方向のトップ特異ベクトルが効果的で単純な普遍的な敵対攻撃を与えることを示します。 ImageNet でトレーニングされた VGG16 および VGG19 モデルの場合、64 枚の画像のテストサンプルを使用したグラデーション、FGSM、および DeepFool 摂動の単純な普遍化により、摂動の合理的な規範に対する最先端の普遍的な攻撃 Dezfooli17,Khrulkov18 に匹敵するフーリング率が得られます。コードは https://github.com/ksandeshk/svd-uap で入手できます。

Deep learning models are known to be vulnerable not only to input-dependent adversarial attacks but also to input-agnostic or universal adversarial attacks. Dezfooli et al. Dezfooli17,Dezfooli17anal construct universal adversarial attack on a given model by looking at a large number of training data points and the geometry of the decision boundary near them. Subsequent work Khrulkov18 constructs universal attack by looking only at test examples and intermediate layers of the given model. In this paper, we propose a simple universalization technique to take any input-dependent adversarial attack and construct a universal attack by only looking at very few adversarial test examples. We do not require details of the given model and have negligible computational overhead for universalization. We theoretically justify our universalization technique by a spectral property common to many input-dependent adversarial perturbations, e.g., gradients, Fast Gradient Sign Method (FGSM) and DeepFool. Using matrix concentration inequalities and spectral perturbation bounds, we show that the top singular vector of input-dependent adversarial directions on a small test sample gives an effective and simple universal adversarial attack. For VGG16 and VGG19 models trained on ImageNet, our simple universalization of Gradient, FGSM, and DeepFool perturbations using a test sample of 64 images gives fooling rates comparable to state-of-the-art universal attacks Dezfooli17,Khrulkov18 for reasonable norms of perturbation. Code available at https://github.com/ksandeshk/svd-uap .

updated: Fri Oct 28 2022 17:37:32 GMT+0000 (UTC)

published: Mon May 18 2020 12:17:38 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト