Adversarial Fine-tuning for Backdoor Defense: Connecting Backdoor Attacks to Adversarial Attacks

Bingxu Mu; Zhenxing Niu; Le Wang; Xue Wang; Rong Jin; Gang Hua

バックドア防御のための敵対的微調整：バックドア攻撃を敵対的攻撃に接続する

ディープニューラルネットワーク（DNN）は、バックドア攻撃と敵対的攻撃の両方に対して脆弱であることが知られています。文献では、これら2種類の攻撃は、それぞれトレーニング時間攻撃と推論時間攻撃に属するため、通常、別個の問題として扱われ、別々に解決されます。ただし、このペーパーでは、それらの間に興味深い関係があります。バックドアが植えられたモデルの場合、その敵対的な例は、トリガーされたサンプルと同様の動作をします。つまり、両方がDNNニューロンの同じサブセットをアクティブにします。これは、モデルにバックドアを仕掛けると、モデルの敵対的な例に大きな影響を与えることを示しています。この観察結果に基づいて、バックドア攻撃から防御するための新しいAdversarial Fine-Tuning（AFT）アルゴリズムを設計します。 5つの最先端のバックドア攻撃に対して、AFTは、クリーンなサンプルのパフォーマンスを明らかに低下させることなくバックドアトリガーを効果的に消去でき、既存の防御方法を大幅に上回っていることを経験的に示しています。

Deep neural networks (DNNs) are known to be vulnerable to both backdoor attacks as well as adversarial attacks. In the literature, these two types of attacks are commonly treated as distinct problems and solved separately, since they belong to training-time and inference-time attacks respectively. However, in this paper we find an intriguing connection between them: for a model planted with backdoors, we observe that its adversarial examples have similar behaviors as its triggered samples, i.e., both activate the same subset of DNN neurons. It indicates that planting a backdoor into a model will significantly affect the model's adversarial examples. Based on this observations, we design a new Adversarial Fine-Tuning (AFT) algorithm to defend against backdoor attacks. We empirically show that, against 5 state-of-the-art backdoor attacks, our AFT can effectively erase the backdoor triggers without obvious performance degradation on clean samples and significantly outperforms existing defense methods.

updated: Wed Mar 23 2022 15:03:53 GMT+0000 (UTC)

published: Sun Feb 13 2022 13:41:15 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト