EAD: an ensemble approach to detect adversarial examples from the hidden features of deep neural networks

Francesco Craighero; Fabrizio Angaroni; Fabio Stella; Chiara Damiani; Marco Antoniotti; Alex Graudenzi

EAD：ディープニューラルネットワークの隠された機能から敵対的な例を検出するためのアンサンブルアプローチ

ディープラーニングの重要な課題の1つは、敵対的な例を検出するための効果的な戦略の定義です。この目的のために、標準的なマルチクラス分類シナリオで、敵対的な例を識別するためのEnsemble Adversarial Detector（EAD）という名前の新しいアプローチを提案します。 EADは、事前にトレーニングされたディープニューラルネットワーク（DNN）の内部表現で、入力インスタンスの異なるプロパティを活用する複数の検出器を組み合わせます。具体的には、EADは、マハラノビス距離とローカル内在次元（LID）に基づく最先端の検出器を、ワンクラスサポートベクターマシン（OSVM）に基づく新たに導入された方法と統合します。すべての構成方法は、正しく分類されたトレーニングインスタンスのセットからのテストインスタンスの距離が大きいほど、敵対的な例である可能性が高いと想定していますが、そのような距離の計算方法は異なります。データ分布の異なるプロパティをキャプチャする際のさまざまな方法の有効性を活用し、したがって、一般化と過剰適合の間のトレードオフに効率的に取り組むために、EADは、独立したハイパーパラメータの後に、ロジスティック回帰分類器の機能として検出器固有の距離スコアを採用します最適化。個別のデータセット（CIFAR-10、CIFAR-100、SVHN）とモデル（ResNet、DenseNet）で、4つの敵対的攻撃（FGSM、BIM、DeepFool、CW）についても、競合するアプローチと比較して、EADアプローチを評価しました。全体として、EADは、大部分の設定で最高のAUROCとAUPRを達成し、他の設定では同等のパフォーマンスを達成することを示しています。最先端技術に対する改善と、EADを簡単に拡張して任意の検出器セットを含める可能性により、敵対的な例の検出の幅広い分野でアンサンブルアプローチを広く採用する道が開かれます。

One of the key challenges in Deep Learning is the definition of effective strategies for the detection of adversarial examples. To this end, we propose a novel approach named Ensemble Adversarial Detector (EAD) for the identification of adversarial examples, in a standard multiclass classification scenario. EAD combines multiple detectors that exploit distinct properties of the input instances in the internal representation of a pre-trained Deep Neural Network (DNN). Specifically, EAD integrates the state-of-the-art detectors based on Mahalanobis distance and on Local Intrinsic Dimensionality (LID) with a newly introduced method based on One-class Support Vector Machines (OSVMs). Although all constituting methods assume that the greater the distance of a test instance from the set of correctly classified training instances, the higher its probability to be an adversarial example, they differ in the way such distance is computed. In order to exploit the effectiveness of the different methods in capturing distinct properties of data distributions and, accordingly, efficiently tackle the trade-off between generalization and overfitting, EAD employs detector-specific distance scores as features of a logistic regression classifier, after independent hyperparameters optimization. We evaluated the EAD approach on distinct datasets (CIFAR-10, CIFAR-100 and SVHN) and models (ResNet and DenseNet) and with regard to four adversarial attacks (FGSM, BIM, DeepFool and CW), also by comparing with competing approaches. Overall, we show that EAD achieves the best AUROC and AUPR in the large majority of the settings and comparable performance in the others. The improvement over the state-of-the-art, and the possibility to easily extend EAD to include any arbitrary set of detectors, pave the way to a widespread adoption of ensemble approaches in the broad field of adversarial example detection.

updated: Thu Nov 25 2021 11:24:28 GMT+0000 (UTC)

published: Wed Nov 24 2021 17:05:26 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト