Deep Neural Rejection against Adversarial Examples

Angelo Sotgiu; Ambra Demontis; Marco Melis; Battista Biggio; Giorgio Fumera; Xiaoyi Feng; Fabio Roli

敵対的な例に対する深い神経拒絶

さまざまなアプリケーションドメインのディープニューラルネットワークによって報告された印象的なパフォーマンスにもかかわらず、それらは敵対的な例、すなわちテスト時に誤分類を引き起こすように慎重に摂動された入力サンプルに対して大部分が脆弱です。この作業では、異なるネットワーク層で異常な特徴表現を示すサンプルを拒否するという考えに基づいて、敵対的な例を検出するためのディープニューラル拒否メカニズムを提案します。競合するアプローチに関して、我々の方法は、トレーニング時に敵対的な例を生成する必要がなく、計算負荷が少なくて済みます。この方法を適切に評価するために、防御メカニズムを認識し、それを回避することを目的とする適応型ホワイトボックス攻撃を定義します。このワーストケースの設定では、出力ネットワーク層によって提供される特徴表現を分析するだけで、敵対的な例を検出する以前に提案された方法よりも我々のアプローチがパフォーマンスを上回ることを経験的に示しています。

Despite the impressive performances reported by deep neural networks in different application domains, they remain largely vulnerable to adversarial examples, i.e., input samples that are carefully perturbed to cause misclassification at test time. In this work, we propose a deep neural rejection mechanism to detect adversarial examples, based on the idea of rejecting samples that exhibit anomalous feature representations at different network layers. With respect to competing approaches, our method does not require generating adversarial examples at training time, and it is less computationally demanding. To properly evaluate our method, we define an adaptive white-box attack that is aware of the defense mechanism and aims to bypass it. Under this worst-case setting, we empirically show that our approach outperforms previously-proposed methods that detect adversarial examples by only analyzing the feature representation provided by the output network layer.

updated: Fri Apr 17 2020 13:42:24 GMT+0000 (UTC)

published: Tue Oct 01 2019 15:08:34 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト