Improving White-box Robustness of Pre-processing Defenses via Joint Adversarial Training

Dawei Zhou; Nannan Wang; Xinbo Gao; Bo Han; Jun Yu; Xiaoyu Wang; Tongliang Liu

共同敵対訓練による前処理防御のホワイトボックス堅牢性の改善

ディープニューラルネットワーク（DNN）は、敵対的なノイズに対して脆弱です。敵対的ノイズの干渉を軽減するために、さまざまな敵対的防御技術が提案されています。その中で、入力前処理方法はスケーラブルであり、DNNを保護する大きな可能性を示しています。ただし、前処理方法は、ホワイトボックス設定でターゲットモデルの敵対的なロバスト性を改善するのではなく、防御が低下するロバスト性低下効果に悩まされる可能性があります。この悪影響の潜在的な原因は、敵対的なトレーニングの例が静的であり、前処理モデルから独立していることです。この問題を解決するために、完全なモデルに対して作成された完全な敵対的な例の影響を調査し、それらが実際に防御の堅牢性にプラスの影響を与えることを発見しました。さらに、前処理方法で敵対トレーニングの例を変更するだけでは、ロバスト性の低下の影響を完全に軽減できないことがわかりました。これは、前処理されたモデルが無視されるという敵対的なリスクによるものであり、これは堅牢性の低下効果のもう1つの原因です。上記の分析を動機として、Joint Adversarial Training Based Pre-processing（JATP）Defenseと呼ばれる方法を提案します。具体的には、特徴空間で見つかった完全な敵対的例を使用して、前処理モデルの特徴類似性ベースの敵対的リスクを定式化します。標準の敵対者トレーニングとは異なり、前処理モデルのみを更新します。これにより、モデル間の転送可能性を向上させるために、ピクセル単位の損失を導入するように求められます。次に、この全体的なリスクを最小限に抑えるために、前処理モデルに関する共同の敵対的トレーニングを実施します。経験的結果は、私たちの方法が、以前の最先端のアプローチと比較して、さまざまなターゲットモデルにわたるロバスト性の低下の影響を効果的に軽減できることを示しています。

Deep neural networks (DNNs) are vulnerable to adversarial noise. A range of adversarial defense techniques have been proposed to mitigate the interference of adversarial noise, among which the input pre-processing methods are scalable and show great potential to safeguard DNNs. However, pre-processing methods may suffer from the robustness degradation effect, in which the defense reduces rather than improving the adversarial robustness of a target model in a white-box setting. A potential cause of this negative effect is that adversarial training examples are static and independent to the pre-processing model. To solve this problem, we investigate the influence of full adversarial examples which are crafted against the full model, and find they indeed have a positive impact on the robustness of defenses. Furthermore, we find that simply changing the adversarial training examples in pre-processing methods does not completely alleviate the robustness degradation effect. This is due to the adversarial risk of the pre-processed model being neglected, which is another cause of the robustness degradation effect. Motivated by above analyses, we propose a method called Joint Adversarial Training based Pre-processing (JATP) defense. Specifically, we formulate a feature similarity based adversarial risk for the pre-processing model by using full adversarial examples found in a feature space. Unlike standard adversarial training, we only update the pre-processing model, which prompts us to introduce a pixel-wise loss to improve its cross-model transferability. We then conduct a joint adversarial training on the pre-processing model to minimize this overall risk. Empirical results show that our method could effectively mitigate the robustness degradation effect across different target models in comparison to previous state-of-the-art approaches.

updated: Thu Jun 10 2021 01:45:32 GMT+0000 (UTC)

published: Thu Jun 10 2021 01:45:32 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト