DAFAR: Defending against Adversaries by Feedback-Autoencoder Reconstruction

Haowen Liu; Ping Yi; Hsiao-Ying Lin; Jie Shi; Weidong Qiu

DAFAR：フィードバックによる敵に対する防御-オートエンコーダの再構築

ディープラーニングは、困難な知覚タスクで印象的なパフォーマンスを示し、インテリジェントサービスを提供するためのソフトウェアで広く使用されています。ただし、研究者は、ディープニューラルネットワークが敵対的な例に対して脆弱であることを発見しました。それ以来、入力で敵を防御するために多くの方法が提案されていますが、それらは攻撃に依存しているか、新しい攻撃では効果がないことが示されています。また、既存の手法のほとんどは複雑な構造またはメカニズムを備えているため、オーバーヘッドや遅延が非常に大きくなり、実際のソフトウェアに適用するのは現実的ではありません。ディープラーニングモデルが、低い領域と時間のオーバーヘッドで、高い有効性と普遍性で敵対的な例を検出/精製できるようにするフィードバックフレームワークであるDAFARを提案します。 DAFARは、被害者モデル、プラグインフィードバックネットワーク、および検出器を含む単純な構造を持っています。重要なアイデアは、犠牲者モデルの特徴抽出レイヤーからフィードバックネットワークに高レベルの特徴をインポートして、入力を再構築することです。このデータストリームは、フィードバックオートエンコーダを形成します。強力な攻撃の場合、被害者モデルに対する知覚できない攻撃を、フィードバックオートエンコーダーに対する明らかな再構築エラー攻撃に直接変換します。これは、検出がはるかに簡単です。弱い攻撃の場合、改革プロセスは敵対的な例の構造を破壊します。実験はMNISTおよびCIFAR-10データセットで実施され、DAFARは、正当なサンプルでのパフォーマンスを損なうことなく、人気のある、おそらく最も高度な攻撃に対して効果的であり、攻撃方法とパラメーター全体で高い有効性と普遍性を備えています。

Deep learning has shown impressive performance on challenging perceptual tasks and has been widely used in software to provide intelligent services. However, researchers found deep neural networks vulnerable to adversarial examples. Since then, many methods are proposed to defend against adversaries in inputs, but they are either attack-dependent or shown to be ineffective with new attacks. And most of existing techniques have complicated structures or mechanisms that cause prohibitively high overhead or latency, impractical to apply on real software. We propose DAFAR, a feedback framework that allows deep learning models to detect/purify adversarial examples in high effectiveness and universality, with low area and time overhead. DAFAR has a simple structure, containing a victim model, a plug-in feedback network, and a detector. The key idea is to import the high-level features from the victim model's feature extraction layers into the feedback network to reconstruct the input. This data stream forms a feedback autoencoder. For strong attacks, it transforms the imperceptible attack on the victim model into the obvious reconstruction-error attack on the feedback autoencoder directly, which is much easier to detect; for weak attacks, the reformation process destroys the structure of adversarial examples. Experiments are conducted on MNIST and CIFAR-10 data-sets, showing that DAFAR is effective against popular and arguably most advanced attacks without losing performance on legitimate samples, with high effectiveness and universality across attack methods and parameters.

updated: Wed Mar 17 2021 14:49:12 GMT+0000 (UTC)

published: Thu Mar 11 2021 06:18:50 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト