Backdoor Defense via Suppressing Model Shortcuts

Sheng Yang; Yiming Li; Yong Jiang; Shu-Tao Xia

モデルショートカットの抑制によるバックドア防御

最近の研究では、ディープニューラルネットワーク (DNN) がトレーニングプロセス中のバックドア攻撃に対して脆弱であることが示されています。具体的には、攻撃者は DNN に隠れたバックドアを埋め込んで、事前定義されたトリガーパターンを通じて悪意のあるモデルの予測を有効にしようとしています。この論文では、モデル構造の角度からバックドアのメカニズムを探ります。バックドアトリガーが通常学習しやすいモデルの「ショートカット」の学習に役立つという理解に触発されて、議論のためにスキップ接続を選択します。具体的には、いくつかのキースキップ接続の出力を減らすと、攻撃成功率 (ASR) が大幅に低下することを示しています。この観察に基づいて、私たちの方法で選択された重要なレイヤーでスキップ接続を抑制することにより、シンプルで効果的なバックドア除去方法を設計します。また、これらのレイヤーに微調整を実装して、高い良性の精度を回復し、ASR をさらに削減します。ベンチマークデータセットでの広範な実験により、この方法の有効性が検証されます。

Recent studies have demonstrated that deep neural networks (DNNs) are vulnerable to backdoor attacks during the training process. Specifically, the adversaries intend to embed hidden backdoors in DNNs so that malicious model predictions can be activated through pre-defined trigger patterns. In this paper, we explore the backdoor mechanism from the angle of the model structure. We select the skip connection for discussions, inspired by the understanding that it helps the learning of model `shortcuts' where backdoor triggers are usually easier to be learned. Specifically, we demonstrate that the attack success rate (ASR) decreases significantly when reducing the outputs of some key skip connections. Based on this observation, we design a simple yet effective backdoor removal method by suppressing the skip connections in critical layers selected by our method. We also implement fine-tuning on these layers to recover high benign accuracy and to further reduce ASR. Extensive experiments on benchmark datasets verify the effectiveness of our method.

updated: Mon Mar 06 2023 02:31:54 GMT+0000 (UTC)

published: Wed Nov 02 2022 15:39:19 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト