Learning to Detect Adversarial Examples Based on Class Scores

Tobias Uelwer; Felix Michels; Oliver De Candido

クラススコアに基づいて敵対的な例を検出することを学ぶ

ディープニューラルネットワーク（DNN）に対する敵対的攻撃の脅威が高まっていることを考えると、効率的な検出方法の研究はこれまで以上に重要になっています。この作業では、すでにトレーニングされた分類モデルのクラススコアに基づいて、敵対的な攻撃の検出を詳しく調べます。敵対的な例を検出するために、クラススコアでサポートベクターマシン（SVM）をトレーニングすることを提案します。私たちの方法は、さまざまな攻撃によって生成された敵対的な例を検出することができ、多数の深い分類モデルに簡単に採用できます。私たちのアプローチは、実装が簡単でありながら、既存の方法と比較して改善された検出率をもたらすことを示しています。さまざまな詳細分類モデルについて広範な経験的分析を実行し、さまざまな最先端の敵対的攻撃を調査します。さらに、提案された方法は、敵対的な攻撃の組み合わせを検出するのに優れていることがわかります。この作業は、すでにトレーニングされた分類モデルのクラススコアを使用するだけで、さまざまな敵対的攻撃を検出できる可能性を示しています。

Given the increasing threat of adversarial attacks on deep neural networks (DNNs), research on efficient detection methods is more important than ever. In this work, we take a closer look at adversarial attack detection based on the class scores of an already trained classification model. We propose to train a support vector machine (SVM) on the class scores to detect adversarial examples. Our method is able to detect adversarial examples generated by various attacks, and can be easily adopted to a plethora of deep classification models. We show that our approach yields an improved detection rate compared to an existing method, whilst being easy to implement. We perform an extensive empirical analysis on different deep classification models, investigating various state-of-the-art adversarial attacks. Moreover, we observe that our proposed method is better at detecting a combination of adversarial attacks. This work indicates the potential of detecting various adversarial attacks simply by using the class scores of an already trained classification model.

updated: Fri Jul 09 2021 13:29:54 GMT+0000 (UTC)

published: Fri Jul 09 2021 13:29:54 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト