Dual Head Adversarial Training

Yujing Jiang; Xingjun Ma; Sarah Monazam Erfani; James Bailey

デュアルヘッド敵対訓練

ディープニューラルネットワーク（DNN）は、敵対的な例/攻撃に対して脆弱であることが知られており、セーフティクリティカルなアプリケーションでの信頼性について懸念を引き起こしています。敵対攻撃に耐性のある堅牢なDNNを訓練するために、多くの防御方法が提案されており、その中で敵対者訓練はこれまでで最も有望な結果を示しています。ただし、最近の研究では、敵対的に訓練されたDNNの精度と堅牢性の間に固有のトレードオフが存在することが示されています。この論文では、既存の敵対的トレーニング方法の堅牢性をさらに向上させるために、新しい手法であるデュアルヘッド敵対的トレーニング（DH-AT）を提案します。敵対的訓練の既存の改良された変形とは異なり、DH-ATはネットワークのアーキテクチャと訓練戦略の両方を変更して、より堅牢性を追求します。具体的には、DH-ATは最初に2番目のネットワークヘッド（またはブランチ）をネットワークの1つの中間層に接続し、次に軽量の畳み込みニューラルネットワーク（CNN）を使用して2つのヘッドの出力を集約します。トレーニング戦略は、2人の頭の相対的な重要性を反映するようにも適合されています。複数のベンチマークデータセットで、DH-ATが既存の敵対的トレーニング方法に顕著な堅牢性の向上をもたらすことができることを経験的に示しています。最先端の敵対訓練方法の1つであるTRADESと比較して、DH-ATはPGD40に対して3.4％、AutoAttackに対して2.3％の堅牢性を向上させ、クリーン精度も1.8％向上させることができます。

Deep neural networks (DNNs) are known to be vulnerable to adversarial examples/attacks, raising concerns about their reliability in safety-critical applications. A number of defense methods have been proposed to train robust DNNs resistant to adversarial attacks, among which adversarial training has so far demonstrated the most promising results. However, recent studies have shown that there exists an inherent tradeoff between accuracy and robustness in adversarially-trained DNNs. In this paper, we propose a novel technique Dual Head Adversarial Training (DH-AT) to further improve the robustness of existing adversarial training methods. Different from existing improved variants of adversarial training, DH-AT modifies both the architecture of the network and the training strategy to seek more robustness. Specifically, DH-AT first attaches a second network head (or branch) to one intermediate layer of the network, then uses a lightweight convolutional neural network (CNN) to aggregate the outputs of the two heads. The training strategy is also adapted to reflect the relative importance of the two heads. We empirically show, on multiple benchmark datasets, that DH-AT can bring notable robustness improvements to existing adversarial training methods. Compared with TRADES, one state-of-the-art adversarial training method, our DH-AT can improve the robustness by 3.4% against PGD40 and 2.3% against AutoAttack, and also improve the clean accuracy by 1.8%.

updated: Wed Apr 21 2021 06:31:33 GMT+0000 (UTC)

published: Wed Apr 21 2021 06:31:33 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト