Mitigating the Accuracy-Robustness Trade-off via Multi-Teacher Adversarial Distillation

Shiji Zhao; Xizhe Wang; Xingxing Wei

マルチ教師による敵対的蒸留による精度と堅牢性のトレードオフの軽減

敵対的トレーニングは、敵対的攻撃に対するディープニューラルネットワークの堅牢性を向上させるための実用的なアプローチです。信頼性の高い堅牢性をもたらしますが、クリーンなサンプルに対するパフォーマンスは、敵対的トレーニング後に悪影響を受けます。これは、精度と堅牢性の間にトレードオフが存在することを意味します。最近、いくつかの研究では、敵対的トレーニングに知識蒸留手法を使用し、堅牢性の向上において競争力のあるパフォーマンスを達成しようとしていますが、クリーンなサンプルの精度にはまだ限界があります。このペーパーでは、精度と堅牢性のトレードオフを軽減するために、マルチ教師敵対的堅牢性蒸留 (MTARD) を導入し、クリーンな例を処理するための強力なクリーン教師と強力で堅牢な教師を適用することにより、モデルの敵対的トレーニングプロセスをガイドします。それぞれ敵対的な例。最適化プロセス中に、さまざまな教師が同様の知識スケールを示すように、教師の温度を調整し、教師の情報エントロピーの一貫性を保つエントロピーベースのバランスアルゴリズムを設計します。さらに、学生が複数の教師から比較的一貫した学習速度を得られるようにするために、さまざまな種類の知識の学習の重みを調整する正規化損失バランスアルゴリズムを提案します。公開データセットに対して行われた一連の実験では、MTARD がさまざまな敵対的攻撃に対して、最先端の敵対的トレーニングおよび蒸留手法よりも優れていることが実証されています。

Adversarial training is a practical approach for improving the robustness of deep neural networks against adversarial attacks. Although bringing reliable robustness, the performance toward clean examples is negatively affected after adversarial training, which means a trade-off exists between accuracy and robustness. Recently, some studies have tried to use knowledge distillation methods in adversarial training, achieving competitive performance in improving the robustness but the accuracy for clean samples is still limited. In this paper, to mitigate the accuracy-robustness trade-off, we introduce the Multi-Teacher Adversarial Robustness Distillation (MTARD) to guide the model's adversarial training process by applying a strong clean teacher and a strong robust teacher to handle the clean examples and adversarial examples, respectively. During the optimization process, to ensure that different teachers show similar knowledge scales, we design the Entropy-Based Balance algorithm to adjust the teacher's temperature and keep the teachers' information entropy consistent. Besides, to ensure that the student has a relatively consistent learning speed from multiple teachers, we propose the Normalization Loss Balance algorithm to adjust the learning weights of different types of knowledge. A series of experiments conducted on public datasets demonstrate that MTARD outperforms the state-of-the-art adversarial training and distillation methods against various adversarial attacks.

updated: Wed Jun 28 2023 12:47:01 GMT+0000 (UTC)

published: Wed Jun 28 2023 12:47:01 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト