Adversarial Training with Rectified Rejection

Tianyu Pang; Huishuai Zhang; Di He; Yinpeng Dong; Hang Su; Wei Chen; Jun Zhu; Tie-Yan Liu

是正拒否による敵対的訓練

敵対的トレーニング (AT) は、モデルの堅牢性を促進するための最も効果的な戦略の 1 つですが、敵対的にトレーニングされた最先端のモデルでさえ、追加データなしで CIFAR-10 で 60% の堅牢なテスト精度を超えるのに苦労しています。実用的。この精度のボトルネックを打破する自然な方法は、拒否オプションを導入することです。信頼性は、一般的に使用される確実性プロキシです。ただし、入力が誤って分類されている場合、バニラの信頼度はモデルの確実性を過大評価する可能性があります。この目的のために、真の信頼度 (T-Con) (つまり、真のクラスの予測確率) を確実性オラクルとして使用し、信頼性を修正することによって T-Con を予測することを提案します。穏やかな条件下では、修正された信頼度 (R-Con) リジェクターと信頼度リジェクターを組み合わせて、適応攻撃の下でも、誤って分類された入力と正しく分類された入力を区別できることを証明します。また、T-Con に合わせて R-Con をトレーニングする方が、堅牢な分類器を学習するよりも簡単なタスクになる可能性があることも定量化しています。私たちの実験では、いくつかの攻撃の下で CIFAR-10、CIFAR-10-C、および CIFAR-100 で修正された拒否 (RR) モジュールを評価し、RR モジュールが堅牢性の向上に関してさまざまな AT フレームワークと十分に互換性があることを示します。少し余分な計算。

Adversarial training (AT) is one of the most effective strategies for promoting model robustness, whereas even the state-of-the-art adversarially trained models struggle to exceed 60% robust test accuracy on CIFAR-10 without additional data, which is far from practical. A natural way to break this accuracy bottleneck is to introduce a rejection option, where confidence is a commonly used certainty proxy. However, the vanilla confidence can overestimate the model certainty if the input is wrongly classified. To this end, we propose to use true confidence (T-Con) (i.e., predicted probability of the true class) as a certainty oracle, and learn to predict T-Con by rectifying confidence. We prove that under mild conditions, a rectified confidence (R-Con) rejector and a confidence rejector can be coupled to distinguish any wrongly classified input from correctly classified ones, even under adaptive attacks. We also quantify that training R-Con to be aligned with T-Con could be an easier task than learning robust classifiers. In our experiments, we evaluate our rectified rejection (RR) module on CIFAR-10, CIFAR-10-C, and CIFAR-100 under several attacks, and demonstrate that the RR module is well compatible with different AT frameworks on improving robustness, with little extra computation.

updated: Mon May 31 2021 08:24:53 GMT+0000 (UTC)

published: Mon May 31 2021 08:24:53 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト