Releasing Inequality Phenomena in L_∞-Adversarial Training via Input Gradient Distillation

Junxi Chen; Junhao Dong; Xiaohua Xie

入力勾配蒸留によるL_∞敵対的トレーニングにおける不平等現象の解放

敵対的な例が出現し、それが DNN にもたらした壊滅的な劣化を示して以来、多くの敵対的防御方法が考案されてきましたが、その中で最も効果的なのは敵対的トレーニングであると考えられています。しかし、最近の研究では、l_∞敵対的トレーニングにおける不等式現象が示され、少数の重要なピクセルがiidノイズによって摂動されたり、遮蔽されたりすると、l_∞敵対的トレーニングされたモデルが脆弱になることが明らかになりました。この論文では、l_∞敵対的トレーニングにおける不等式現象を解放するために、入力勾配蒸留(IGD)と呼ばれるシンプルかつ効果的な方法を提案します。実験によると、IGD は、PGDAT と比較して、モデルの敵対的堅牢性を維持しながら、l_∞ 敵対的にトレーニングされたモデルの誘導ノイズと誘導オクルージョンに対するエラー率を最大 60% および誘導オクルージョンに対して最大 60% および 16.53% 減少させ、Imagenet-C のノイズの多い画像に対して最大 16.53% 減少させることが示されています。 21.11%まで。さらに、モデルの顕著性マップの等価性によってそのような堅牢性が向上する理由を正式に説明します。

Since adversarial examples appeared and showed the catastrophic degradation they brought to DNN, many adversarial defense methods have been devised, among which adversarial training is considered the most effective. However, a recent work showed the inequality phenomena in l_∞-adversarial training and revealed that the l_∞-adversarially trained model is vulnerable when a few important pixels are perturbed by i.i.d. noise or occluded. In this paper, we propose a simple yet effective method called Input Gradient Distillation (IGD) to release the inequality phenomena in l_∞-adversarial training. Experiments show that while preserving the model's adversarial robustness, compared to PGDAT, IGD decreases the l_∞-adversarially trained model's error rate to inductive noise and inductive occlusion by up to 60% and 16.53%, and to noisy images in Imagenet-C by up to 21.11%. Moreover, we formally explain why the equality of the model's saliency map can improve such robustness.

updated: Wed May 17 2023 15:03:17 GMT+0000 (UTC)

published: Tue May 16 2023 09:23:42 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト