Adversarial Profiles: Detecting Out-Distribution & Adversarial Samples in Pre-trained CNNs

Arezoo Rajabi; Rakesh B. Bobba

敵対的プロファイル：事前トレーニング済みCNNでのアウトディストリビューションと敵対的サンプルの検出

畳み込みニューラルネットワーク（CNN）の精度は高いにもかかわらず、敵対的および外部配布の例に対して脆弱です。これらのだまされた例に対してCNNを検出または堅牢にする傾向がある多くの提案された方法があります。ただし、そのような方法のほとんどは、ネットワークを再トレーニングしたり、検出パラメータを調整したりするために、さまざまな不正な例にアクセスする必要があります。ここでは、CNNを再トレーニングしたり、さまざまなだましの例にアクセスしたりすることなく、事前にトレーニングされたCNNに対して敵対的および外部配布の例を検出する方法を提案します。この目的のために、1つの敵対的攻撃生成手法のみを使用して、各クラスの敵対的プロファイルを作成します。次に、作成された敵対的プロファイルを各入力に適用し、出力を使用して入力が正当であるかどうかを判断する、事前にトレーニングされたCNNの周りに検出器をラップします。 MNISTデータセットを使用したこのアプローチの最初の評価では、敵対的プロファイルベースの検出が、少なくとも92の外部配布例と59％の敵対的例の検出に効果的であることが示されています。

Despite high accuracy of Convolutional Neural Networks (CNNs), they are vulnerable to adversarial and out-distribution examples. There are many proposed methods that tend to detect or make CNNs robust against these fooling examples. However, most such methods need access to a wide range of fooling examples to retrain the network or to tune detection parameters. Here, we propose a method to detect adversarial and out-distribution examples against a pre-trained CNN without needing to retrain the CNN or needing access to a wide variety of fooling examples. To this end, we create adversarial profiles for each class using only one adversarial attack generation technique. We then wrap a detector around the pre-trained CNN that applies the created adversarial profile to each input and uses the output to decide whether or not the input is legitimate. Our initial evaluation of this approach using MNIST dataset show that adversarial profile based detection is effective in detecting at least 92 of out-distribution examples and 59% of adversarial examples.

updated: Wed Nov 18 2020 07:10:13 GMT+0000 (UTC)

published: Wed Nov 18 2020 07:10:13 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト