Releasing Inequlity Phenomena in L_∞-Adversarial Training via Input Gradient Distillation

Junxi Chen; Junhao Dong; Xiaohua Xie

入力勾配蒸留によるL_∞敵対的トレーニングにおける不等式現象の解放

敵対的な例が出現し、それが DNN にもたらした壊滅的な劣化を示して以来、多くの敵対的防御方法が考案されてきましたが、その中で最も効果的なのは敵対的トレーニングであると考えられています。しかし、最近の研究では、l_∞敵対的トレーニングにおける不等式現象が示され、少数の重要なピクセルがiidノイズによって摂動されたり、遮蔽されたりすると、l_∞敵対的トレーニングされたモデルが脆弱になることが明らかになりました。この論文では、l_∞敵対的トレーニングにおける不等式現象を解放するために、入力勾配蒸留と呼ばれるシンプルかつ効果的な方法を提案します。実験では、モデルの敵対的堅牢性を維持しながら、入力勾配蒸留により、id ノイズとオクルージョンに対するモデルの堅牢性が向上することが示されています。さらに、モデルの顕著性マップが等しいと、id ノイズやオクルージョンに対するモデルの堅牢性が向上する理由を正式に説明します。 Github:https://github.com/fhdnskfbeuv/Inuput-Gradient-Distillation

Since adversarial examples appeared and showed the catastrophic degradation they brought to DNN, many adversarial defense methods have been devised, among which adversarial training is considered the most effective. However, a recent work showed the inequality phenomena in l_∞-adversarial training and revealed that the l_∞-adversarially trained model is vulnerable when a few important pixels are perturbed by i.i.d. noise or occluded. In this paper, we propose a simple yet effective method called Input Gradient Distillation to release the inequality phenomena in l_∞-adversarial training. Experiments show that while preserving the model's adversarial robustness, Input Gradient Distillation improves the model's robustness to i.i.d. noise and occlusion. Moreover, we formally explain why the equality of the model's saliency map can improve the model's robustness to i.i.d. noise or occlusion. Github:https://github.com/fhdnskfbeuv/Inuput-Gradient-Distillation

updated: Tue May 16 2023 09:23:42 GMT+0000 (UTC)

published: Tue May 16 2023 09:23:42 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト