Alleviating Robust Overfitting of Adversarial Training With Consistency Regularization

Shudong Zhang; Haichang Gao; Tianwei Zhang; Yunyi Zhou; Zihui Wu

一貫性の正則化による敵対的訓練のロバストな過剰適合の緩和

敵対的訓練（AT）は、敵対的攻撃からディープニューラルネットワーク（DNN）を防御するための最も効果的な方法の1つであることが証明されています。ただし、AT中は、ロバストな過剰適合の現象、つまりロバスト性が特定の段階で急激に低下する現象が常に存在します。堅牢なモデルを取得するには、この堅牢な一般化ギャップを減らすことが非常に重要です。この論文では、新しい角度からのロバストな過剰適合に向けた詳細な研究を紹介します。半教師あり学習で一般的な手法である一貫性の正則化には、ATと同様の目標があり、堅牢な過剰適合を軽減するために使用できることがわかります。この観察結果を経験的に検証し、以前のソリューションの大部分が一貫性の正則化に暗黙的に関連していることを発見しました。これに動機付けられて、一貫性の正則化と平均教師（MT）戦略をATに統合する新しいATソリューションを紹介します。具体的には、トレーニングステップ全体の学生モデルの平均重みから得られる教師モデルを紹介します。次に、一貫性損失関数を設計して、敵対的な例での学生モデルの予測分布を、クリーンなサンプルでの教師モデルの予測分布と一致させます。実験は、提案された方法がロバストな過剰適合を効果的に軽減し、一般的な敵対的攻撃に対するDNNモデルのロバスト性を改善できることを示しています。

Adversarial training (AT) has proven to be one of the most effective ways to defend Deep Neural Networks (DNNs) against adversarial attacks. However, the phenomenon of robust overfitting, i.e., the robustness will drop sharply at a certain stage, always exists during AT. It is of great importance to decrease this robust generalization gap in order to obtain a robust model. In this paper, we present an in-depth study towards the robust overfitting from a new angle. We observe that consistency regularization, a popular technique in semi-supervised learning, has a similar goal as AT and can be used to alleviate robust overfitting. We empirically validate this observation, and find a majority of prior solutions have implicit connections to consistency regularization. Motivated by this, we introduce a new AT solution, which integrates the consistency regularization and Mean Teacher (MT) strategy into AT. Specifically, we introduce a teacher model, coming from the average weights of the student models over the training steps. Then we design a consistency loss function to make the prediction distribution of the student models over adversarial examples consistent with that of the teacher model over clean samples. Experiments show that our proposed method can effectively alleviate robust overfitting and improve the robustness of DNN models against common adversarial attacks.

updated: Tue May 24 2022 03:18:43 GMT+0000 (UTC)

published: Tue May 24 2022 03:18:43 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト