Robustness May Be at Odds with Fairness: An Empirical Study on Class-wise Accuracy

Philipp Benz; Chaoning Zhang; Adil Karjauv; In So Kweon

堅牢性は公平性と対立する可能性がある：クラスごとの精度に関する経験的研究

畳み込みニューラルネットワーク（CNN）は大幅な進歩を遂げましたが、敵対的な攻撃に対して脆弱であることが広く知られています。敵対的訓練は、強力なホワイトボックス攻撃に対する敵対的堅牢性を向上させるために最も広く使用されている手法です。以前の研究では、クラスごとの評価なしでモデルの平均ロバスト性を評価および改善してきました。平均的な評価だけでは、誤った堅牢性の感覚が得られる可能性があります。たとえば、攻撃者は脆弱なクラスの攻撃に集中できます。これは、特に脆弱なクラスが自動運転の「人間」などの重要なクラスである場合に危険になる可能性があります。敵対的に訓練されたモデルのクラスごとの精度とロバスト性に関する実証的研究を提案します。トレーニングデータセットのサンプル数がクラスごとに等しい場合でも、精度と堅牢性についてクラス間の不一致が存在することがわかりました。たとえば、CIFAR10では、「cat」は他のクラスよりもはるかに脆弱です。さらに、このクラス間の不一致は、通常の訓練を受けたモデルにも存在しますが、敵対的な訓練は不一致をさらに増大させる傾向があります。私たちの仕事は、次の質問を調査することを目的としています。（a）データセット、モデルアーキテクチャ、最適化ハイパーパラメータに関係なく、クラス間の不一致の現象は普遍的ですか。（b）もしそうなら、クラス間の不一致について考えられる説明は何でしょうか？（c）ロングテール分類で提案された手法は、クラス間の不一致に対処するための敵対的訓練に容易に拡張できますか？

Convolutional neural networks (CNNs) have made significant advancement, however, they are widely known to be vulnerable to adversarial attacks. Adversarial training is the most widely used technique for improving adversarial robustness to strong white-box attacks. Prior works have been evaluating and improving the model average robustness without class-wise evaluation. The average evaluation alone might provide a false sense of robustness. For example, the attacker can focus on attacking the vulnerable class, which can be dangerous, especially, when the vulnerable class is a critical one, such as "human" in autonomous driving. We propose an empirical study on the class-wise accuracy and robustness of adversarially trained models. We find that there exists inter-class discrepancy for accuracy and robustness even when the training dataset has an equal number of samples for each class. For example, in CIFAR10, "cat" is much more vulnerable than other classes. Moreover, this inter-class discrepancy also exists for normally trained models, while adversarial training tends to further increase the discrepancy. Our work aims to investigate the following questions: (a) is the phenomenon of inter-class discrepancy universal regardless of datasets, model architectures and optimization hyper-parameters? (b) If so, what can be possible explanations for the inter-class discrepancy? (c) Can the techniques proposed in the long tail classification be readily extended to adversarial training for addressing the inter-class discrepancy?

updated: Sun Oct 10 2021 18:23:13 GMT+0000 (UTC)

published: Mon Oct 26 2020 06:32:32 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト