Combating Adversaries with Anti-Adversaries

Motasem Alfarra; Juan C. Pérez; Ali Thabet; Adel Bibi; Philip H. S. Torr; Bernard Ghanem

敵対者との戦い

ディープニューラルネットワークは、敵対的攻撃として知られる小さな入力摂動に対して脆弱です。これらの敵は、真のクラスラベルに対するネットワークの信頼性を繰り返し最小化することによって構築されるという事実に触発されて、この影響に対抗することを目的とした反敵層を提案します。特に、私たちのレイヤーは、敵対するものとは反対の方向に入力摂動を生成し、入力の摂動バージョンを分類器に供給します。私たちのアプローチはトレーニングフリーであり、理論的にサポートされています。レイヤーを名目上および堅牢にトレーニングされたモデルの両方と組み合わせることにより、アプローチの有効性を検証し、ブラックボックスからCIFAR10、CIFAR100、およびImageNetへの適応攻撃までの大規模な実験を実施します。私たちのレイヤーは、クリーンな精度を犠牲にすることなく、モデルの堅牢性を大幅に向上させます。

Deep neural networks are vulnerable to small input perturbations known as adversarial attacks. Inspired by the fact that these adversaries are constructed by iteratively minimizing the confidence of a network for the true class label, we propose the anti-adversary layer, aimed at countering this effect. In particular, our layer generates an input perturbation in the opposite direction of the adversarial one and feeds the classifier a perturbed version of the input. Our approach is training-free and theoretically supported. We verify the effectiveness of our approach by combining our layer with both nominally and robustly trained models and conduct large-scale experiments from black-box to adaptive attacks on CIFAR10, CIFAR100, and ImageNet. Our layer significantly enhances model robustness while coming at no cost on clean accuracy.

updated: Thu Dec 16 2021 17:21:21 GMT+0000 (UTC)

published: Fri Mar 26 2021 09:36:59 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト