Analysis and Applications of Class-wise Robustness in Adversarial Training

Qi Tian; Kun Kuang; Kelu Jiang; Fei Wu; Yisen Wang

敵対的訓練におけるクラスごとの頑健性の分析と応用

敵対的なトレーニングは、敵対的な例に対するモデルの堅牢性を向上させるための最も効果的なアプローチの 1 つです。ただし、以前の研究は主にモデルの全体的な堅牢性に焦点を当てており、敵対的トレーニングに関与する各クラスの役割に関する詳細な分析はまだ欠落しています。この論文では、敵対訓練におけるクラスごとの堅牢性を分析することを提案します。まず、MNIST、CIFAR-10、CIFAR-100、SVHN、STL-10、ImageNet の 6 つのベンチマークデータセットでの敵対的トレーニングの詳細な診断を提供します。驚くべきことに、クラス間でロバストネスの顕著な相違があり、ロバストモデルの不均衡/不公平なクラスごとのロバスト性につながることがわかりました。さらに、クラス間の関係を調査し続け、アンバランスなクラスごとの堅牢性は、さまざまな攻撃方法と防御方法の間でかなり一貫していることを発見しました。さらに、敵対的学習のより強力な攻撃方法は、主に脆弱なクラス (つまり、堅牢性の低いクラス) に対するより成功した攻撃からパフォーマンスの向上を達成することを観察しています。これらの興味深い調査結果に触発され、各画像の信頼度分布の温度要因でクラス間のロバスト性の不均衡を拡大することを提案する、温度-PGD 攻撃と呼ばれる従来の PGD 攻撃に基づいたシンプルだが効果的な攻撃方法を設計します。実験は、私たちの方法が PGD 攻撃よりも高い攻撃率を達成できることを示しています。さらに、防御の観点から、クラスごとの堅牢性の大きな違いを軽減するために、最も脆弱なクラスの堅牢性を改善するために、トレーニングと推論のフェーズにもいくつかの変更を加えます。私たちの仕事は、敵対的訓練のより包括的な理解に貢献し、堅牢なモデルのクラスごとの特性を再考することに貢献できると信じています。

Adversarial training is one of the most effective approaches to improve model robustness against adversarial examples. However, previous works mainly focus on the overall robustness of the model, and the in-depth analysis on the role of each class involved in adversarial training is still missing. In this paper, we propose to analyze the class-wise robustness in adversarial training. First, we provide a detailed diagnosis of adversarial training on six benchmark datasets, i.e., MNIST, CIFAR-10, CIFAR-100, SVHN, STL-10 and ImageNet. Surprisingly, we find that there are remarkable robustness discrepancies among classes, leading to unbalance/unfair class-wise robustness in the robust models. Furthermore, we keep investigating the relations between classes and find that the unbalanced class-wise robustness is pretty consistent among different attack and defense methods. Moreover, we observe that the stronger attack methods in adversarial learning achieve performance improvement mainly from a more successful attack on the vulnerable classes (i.e., classes with less robustness). Inspired by these interesting findings, we design a simple but effective attack method based on the traditional PGD attack, named Temperature-PGD attack, which proposes to enlarge the robustness disparity among classes with a temperature factor on the confidence distribution of each image. Experiments demonstrate our method can achieve a higher attack rate than the PGD attack. Furthermore, from the defense perspective, we also make some modifications in the training and inference phases to improve the robustness of the most vulnerable class, so as to mitigate the large difference in class-wise robustness. We believe our work can contribute to a more comprehensive understanding of adversarial training as well as rethinking the class-wise properties in robust models.

updated: Wed Jun 02 2021 18:00:46 GMT+0000 (UTC)

published: Sat May 29 2021 07:28:35 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト