Attack-Agnostic Adversarial Detection

Jiaxin Cheng; Mohamed Hussein; Jay Billa; Wael AbdAlmageed

攻撃にとらわれない敵対的検出

近年の敵対的攻撃の数の増加は、攻撃者が攻撃の種類を知った後に検出器を訓練する必要があり、今後の攻撃を検出する際の良好なパフォーマンスを確保するために多くのモデルを維持する必要があるため、攻撃者に防御者よりも有利です。敵対的攻撃の検出を異常検出の問題として扱い、検出器が攻撃にとらわれないようにすることで、攻撃者と防御者の間の綱引きを終わらせる方法を提案します。敵対的摂動によって引き起こされる統計的偏差を2つの側面で定量化します。 Least Significant Component Feature（LSCF）は、良性サンプルの統計からの敵対的な例の偏差を定量化し、Hessian Feature（HF）は、局所的な損失の曲率を測定することによって、敵対的な例がモデルの最適の風景をどのように歪めるかを反映します。経験的結果は、私たちの方法がCIFAR10、CIFAR100、およびSVHNでそれぞれ94.9％、89.7％、および94.6％の全体的なROC AUCを達成でき、ほとんどの攻撃で敵対的な例で訓練された敵対的な検出器と同等のパフォーマンスを発揮することを示しています。

The growing number of adversarial attacks in recent years gives attackers an advantage over defenders, as defenders must train detectors after knowing the types of attacks, and many models need to be maintained to ensure good performance in detecting any upcoming attacks. We propose a way to end the tug-of-war between attackers and defenders by treating adversarial attack detection as an anomaly detection problem so that the detector is agnostic to the attack. We quantify the statistical deviation caused by adversarial perturbations in two aspects. The Least Significant Component Feature (LSCF) quantifies the deviation of adversarial examples from the statistics of benign samples and Hessian Feature (HF) reflects how adversarial examples distort the landscape of the model's optima by measuring the local loss curvature. Empirical results show that our method can achieve an overall ROC AUC of 94.9%, 89.7%, and 94.6% on CIFAR10, CIFAR100, and SVHN, respectively, and has comparable performance to adversarial detectors trained with adversarial examples on most of the attacks.

updated: Wed Jun 01 2022 13:41:40 GMT+0000 (UTC)

published: Wed Jun 01 2022 13:41:40 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト