Mitigating the Impact of Adversarial Attacks in Very Deep Networks

Mohammed Hassanin; Ibrahim Radwan; Nour Moustafa; Murat Tahtali; Neeraj Kumar

非常に深いネットワークでの敵対的攻撃の影響の軽減

ディープニューラルネットワーク（DNN）モデルにはセキュリティ上の懸念に関連する脆弱性があり、攻撃者は通常、複雑なハッキング手法を使用して構造を公開します。データポイズニング対応の摂動攻撃は、モデルに誤ったデータを挿入する複雑な敵対的攻撃です。これらは、モデルの精度と収束率を低下させるため、学習プロセスに悪影響を及ぼしますが、より深いネットワークにはメリットがありません。この論文では、それらの影響を軽減するための攻撃にとらわれないベースの防御方法を提案します。その中で、防御的特徴層（DFL）は、特徴空間内の不正な摂動サンプルの影響を中和するのに役立つ、よく知られたDNNアーキテクチャと統合されています。攻撃された入力サンプルを正しく分類するためのこの方法の堅牢性と信頼性を高めるために、Polarized Contrastive Loss（PCL）と呼ばれる識別損失関数を使用してトレーニング済みモデルの隠れた空間を正規化します。異なるクラスのサンプル間の識別を改善し、同じクラスのサンプルの類似性を維持します。また、DFLとPCLをコンパクトなモデルに統合して、データポイズニング攻撃から防御します。この方法は、データポイズニング対応の摂動攻撃を伴うCIFAR-10およびMNISTデータセットを使用してトレーニングおよびテストされており、実験結果から、最近のピア手法と比較して優れたパフォーマンスが明らかになっています。

Deep Neural Network (DNN) models have vulnerabilities related to security concerns, with attackers usually employing complex hacking techniques to expose their structures. Data poisoning-enabled perturbation attacks are complex adversarial ones that inject false data into models. They negatively impact the learning process, with no benefit to deeper networks, as they degrade a model's accuracy and convergence rates. In this paper, we propose an attack-agnostic-based defense method for mitigating their influence. In it, a Defensive Feature Layer (DFL) is integrated with a well-known DNN architecture which assists in neutralizing the effects of illegitimate perturbation samples in the feature space. To boost the robustness and trustworthiness of this method for correctly classifying attacked input samples, we regularize the hidden space of a trained model with a discriminative loss function called Polarized Contrastive Loss (PCL). It improves discrimination among samples in different classes and maintains the resemblance of those in the same class. Also, we integrate a DFL and PCL in a compact model for defending against data poisoning attacks. This method is trained and tested using the CIFAR-10 and MNIST datasets with data poisoning-enabled perturbation attacks, with the experimental results revealing its excellent performance compared with those of recent peer techniques.

updated: Tue Dec 08 2020 21:25:44 GMT+0000 (UTC)

published: Tue Dec 08 2020 21:25:44 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト