Utilizing Network Properties to Detect Erroneous Inputs

Matt Gorbett; Nathaniel Blanchard

ネットワークプロパティを使用して誤った入力を検出する

ニューラルネットワークは、敵対的、破損、配信不能、誤分類の例など、さまざまな誤った入力に対して脆弱です。この作業では、事前に学習されたニューラルネットワークの非表示およびソフトマックスの特徴ベクトルを使用して、これら4種類のエラーデータを検出する線形SVM分類器を学習させます。私たちの結果は、これらの欠陥のあるデータ型は通常、正しい例から線形に分離可能な活性化特性を示し、余分なトレーニングやオーバーヘッドなしで悪い入力を拒否できることを示しています。さまざまな範囲のデータセット、ドメイン、事前に訓練されたモデル、および敵対的な攻撃の結果を実験的に検証します。

Neural networks are vulnerable to a wide range of erroneous inputs such as adversarial, corrupted, out-of-distribution, and misclassified examples. In this work, we train a linear SVM classifier to detect these four types of erroneous data using hidden and softmax feature vectors of pre-trained neural networks. Our results indicate that these faulty data types generally exhibit linearly separable activation properties from correct examples, giving us the ability to reject bad inputs with no extra training or overhead. We experimentally validate our findings across a diverse range of datasets, domains, pre-trained models, and adversarial attacks.

updated: Fri Mar 24 2023 19:01:41 GMT+0000 (UTC)

published: Fri Feb 28 2020 03:20:55 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト