Adversarial Purification through Representation Disentanglement

Tao Bai; Jun Zhao; Lanqing Guo; Bihan Wen

表現の解きほぐしによる敵対的浄化

ディープラーニングモデルは、敵対的な例に対して脆弱であり、理解できない間違いを犯します。これは、実際の展開に脅威をもたらします。敵対訓練のアイデアと組み合わせると、前処理ベースの防御は人気があり、タスクの独立性と優れた一般化可能性のために使用するのに便利です。現在の防御方法、特に浄化は、自然画像を学習して復元することで「ノイズ」を除去する傾向がありますが、ランダムノイズとは異なり、敵のパターンは画像との強い相関関係があるため、モデルトレーニング中に過剰適合しやすくなります。この作業では、前処理防御として自然画像の解きほぐしと敵対的摂動を提示することにより、新しい敵対的浄化スキームを提案します。広範な実験により、防御は一般化可能であり、目に見えない強力な敵対的攻撃から大幅に保護されることが示されています。これにより、成功率が低下します。最先端のアンサンブル攻撃の割合は平均61.7％から14.9％であり、既存の多くの方法よりも優れています。特に、私たちの防御は、摂動された画像を完全に復元し、バックボーンモデルのクリーンな精度を損なうことはありません。実際には非常に望ましい。

Deep learning models are vulnerable to adversarial examples and make incomprehensible mistakes, which puts a threat on their real-world deployment. Combined with the idea of adversarial training, preprocessing-based defenses are popular and convenient to use because of their task independence and good generalizability. Current defense methods, especially purification, tend to remove ``noise" by learning and recovering the natural images. However, different from random noise, the adversarial patterns are much easier to be overfitted during model training due to their strong correlation to the images. In this work, we propose a novel adversarial purification scheme by presenting disentanglement of natural images and adversarial perturbations as a preprocessing defense. With extensive experiments, our defense is shown to be generalizable and make significant protection against unseen strong adversarial attacks. It reduces the success rates of state-of-the-art ensemble attacks from 61.7% to 14.9% on average, superior to a number of existing methods. Notably, our defense restores the perturbed images perfectly and does not hurt the clean accuracy of backbone models, which is highly desirable in practice.

updated: Fri Oct 15 2021 01:45:31 GMT+0000 (UTC)

published: Fri Oct 15 2021 01:45:31 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト