Analysis and Applications of Class-wise Robustness in Adversarial Training

Qi Tian; Kun Kuang; Kelu Jiang; Fei Wu; Yisen Wang

敵対的訓練におけるクラスごとのロバストネスの分析と応用

敵対的訓練は、敵対的例に対するモデルの堅牢性を向上させるための最も効果的なアプローチの1つです。ただし、これまでの作業は主にモデルの全体的な堅牢性に焦点を当てており、敵対的な訓練に関与する各クラスの役割に関する詳細な分析はまだ不足しています。この論文では、敵対的訓練におけるクラスごとのロバスト性を分析することを提案します。まず、6つのベンチマークデータセット、つまりMNIST、CIFAR-10、CIFAR-100、SVHN、STL-10、およびImageNetでの敵対的トレーニングの詳細な診断を提供します。驚いたことに、クラス間に顕著なロバスト性の不一致があり、ロバストモデルのクラスごとのロバスト性が不均衡/不公平になっていることがわかりました。さらに、クラス間の関係を調査し続け、不均衡なクラスごとの堅牢性は、さまざまな攻撃および防御方法間でかなり一貫していることがわかりました。さらに、敵対的学習におけるより強力な攻撃方法は、主に脆弱なクラス（つまり、堅牢性の低いクラス）へのより成功した攻撃からパフォーマンスの向上を達成することを観察します。これらの興味深い発見に触発されて、Temperature-PGD攻撃と呼ばれる従来のPGD攻撃に基づいて、シンプルで効果的な攻撃方法を設計します。これは、各画像の信頼度分布の温度係数を使用して、クラス間のロバスト性の格差を拡大することを提案します。実験は、私たちの方法がPGD攻撃よりも高い攻撃率を達成できることを示しています。さらに、防御の観点から、トレーニングと推論のフェーズでいくつかの変更を加えて、最も脆弱なクラスの堅牢性を向上させ、クラスごとの堅牢性の大きな違いを軽減します。私たちの仕事は、敵対的な訓練のより包括的な理解に貢献するだけでなく、堅牢なモデルのクラスごとの特性を再考することに貢献できると信じています。

Adversarial training is one of the most effective approaches to improve model robustness against adversarial examples. However, previous works mainly focus on the overall robustness of the model, and the in-depth analysis on the role of each class involved in adversarial training is still missing. In this paper, we propose to analyze the class-wise robustness in adversarial training. First, we provide a detailed diagnosis of adversarial training on six benchmark datasets, i.e., MNIST, CIFAR-10, CIFAR-100, SVHN, STL-10 and ImageNet. Surprisingly, we find that there are remarkable robustness discrepancies among classes, leading to unbalance/unfair class-wise robustness in the robust models. Furthermore, we keep investigating the relations between classes and find that the unbalanced class-wise robustness is pretty consistent among different attack and defense methods. Moreover, we observe that the stronger attack methods in adversarial learning achieve performance improvement mainly from a more successful attack on the vulnerable classes (i.e., classes with less robustness). Inspired by these interesting findings, we design a simple but effective attack method based on the traditional PGD attack, named Temperature-PGD attack, which proposes to enlarge the robustness disparity among classes with a temperature factor on the confidence distribution of each image. Experiments demonstrate our method can achieve a higher attack rate than the PGD attack. Furthermore, from the defense perspective, we also make some modifications in the training and inference phase to improve the robustness of the most vulnerable class, so as to mitigate the large difference in class-wise robustness. We believe our work can contribute to a more comprehensive understanding of adversarial training as well as rethinking the class-wise properties in robust models.

updated: Wed Jun 23 2021 05:00:03 GMT+0000 (UTC)

published: Sat May 29 2021 07:28:35 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト