Adversarial Training with Rectified Rejection

Tianyu Pang; Huishuai Zhang; Di He; Yinpeng Dong; Hang Su; Wei Chen; Jun Zhu; Tie-Yan Liu

整流された拒絶を伴う敵対的訓練

敵対的訓練（AT）は、モデルの堅牢性を促進するための最も効果的な戦略の1つですが、最先端の敵対的訓練を受けたモデルでさえ、追加データなしでCIFAR-10で65％の堅牢なテスト精度を超えるのに苦労しています。実用的。この精度のボトルネックを超えて改善する自然な方法は、信頼が一般的に使用される確実性プロキシである拒否オプションを導入することです。ただし、入力が誤って分類されている場合、バニラの信頼度はモデルの確実性を過大評価する可能性があります。この目的のために、真の信頼度（T-Con）（つまり、真のクラスの予測確率）を確実性のオラクルとして使用し、信頼度を修正することによってT-Conを予測することを学習することを提案します。興味深いことに、穏やかな条件下では、修正された信頼（R-Con）リジェクターと信頼リジェクターを組み合わせて、誤って分類された入力と正しく分類された入力を区別できることを証明します。また、R-ConをT-Conに合わせるようにトレーニングすることは、堅牢な分類器を学習するよりも簡単な作業になる可能性があることも定量化しています。私たちの実験では、CIFAR-10、CIFAR-10-C、およびCIFAR-100の修正拒否（RR）モジュールをいくつかの攻撃の下で評価し、RRモジュールが堅牢性の向上に関するさまざまなATフレームワークと十分に互換性があることを示しています。少し余分な計算。

Adversarial training (AT) is one of the most effective strategies for promoting model robustness, whereas even the state-of-the-art adversarially trained models struggle to exceed 65% robust test accuracy on CIFAR-10 without additional data, which is far from practical. A natural way to improve beyond this accuracy bottleneck is to introduce a rejection option, where confidence is a commonly used certainty proxy. However, the vanilla confidence can overestimate the model certainty if the input is wrongly classified. To this end, we propose to use true confidence (T-Con) (i.e., predicted probability of the true class) as a certainty oracle, and learn to predict T-Con by rectifying confidence. Intriguingly, we prove that under mild conditions, a rectified confidence (R-Con) rejector and a confidence rejector can be coupled to distinguish any wrongly classified input from correctly classified ones. We also quantify that training R-Con to be aligned with T-Con could be an easier task than learning robust classifiers. In our experiments, we evaluate our rectified rejection (RR) module on CIFAR-10, CIFAR-10-C, and CIFAR-100 under several attacks, and demonstrate that the RR module is well compatible with different AT frameworks on improving robustness, with little extra computation.

updated: Wed Oct 06 2021 06:04:40 GMT+0000 (UTC)

published: Mon May 31 2021 08:24:53 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト