An Algorithm for Out-Of-Distribution Attack to Neural Network Encoder

Liang Liang; Linhai Ma; Linchen Qian; Jiasong Chen

ニューラルネットワークエンコーダへの分布外攻撃のアルゴリズム

ディープニューラルネットワーク（DNN）、特に畳み込みニューラルネットワークは、画像分類タスクで優れたパフォーマンスを実現しています。ただし、このようなパフォーマンスは、トレーニング済みモデルへの入力がトレーニングサンプルに類似している場合、つまり入力がトレーニングセットの確率分布に従う場合にのみ保証されます。 Out-Of-Distribution（OOD）サンプルは、トレーニングセットの分布に従わないため、OODサンプルで予測されたクラスラベルは無意味になります。 OOD検出には、分類ベースの方法が提案されています。ただし、この調査では、このタイプの方法には理論上の保証がなく、DNNモデルの次元削減のため、OODAttackアルゴリズムによって実質的に破られる可能性があることを示しています。また、グロー尤度ベースのOOD検出も壊れやすいことを示します。

Deep neural networks (DNNs), especially convolutional neural networks, have achieved superior performance on image classification tasks. However, such performance is only guaranteed if the input to a trained model is similar to the training samples, i.e., the input follows the probability distribution of the training set. Out-Of-Distribution (OOD) samples do not follow the distribution of training set, and therefore the predicted class labels on OOD samples become meaningless. Classification-based methods have been proposed for OOD detection; however, in this study we show that this type of method has no theoretical guarantee and is practically breakable by our OOD Attack algorithm because of dimensionality reduction in the DNN models. We also show that Glow likelihood-based OOD detection is breakable as well.

updated: Wed Jan 27 2021 17:58:34 GMT+0000 (UTC)

published: Thu Sep 17 2020 02:10:36 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト