Two Souls in an Adversarial Image: Towards Universal Adversarial Example Detection using Multi-view Inconsistency

Sohaib Kiani; Sana Awan; Chao Lan; Fengjun Li; Bo Luo

敵対的なイメージの2つの魂：マルチビューの矛盾を使用した普遍的な敵対的な例の検出に向けて

ディープニューラルネットワーク（DNN）に対する回避攻撃では、攻撃者は、良性のサンプルと視覚的に区別できない敵対的なインスタンスを生成し、それらをターゲットDNNに送信して、誤分類をトリガーします。本論文では、新しい観測に基づいて、新しいマルチビュー敵対画像検出器、すなわちアルゴスを提案する。つまり、敵対的なインスタンスには2つの「魂」が存在します。つまり、真のラベルに対応する視覚的に変更されていないコンテンツと、誤って分類されたラベルに対応する追加の目に見えない摂動です。このような不整合は、元の画像から選択されたシードピクセル、選択されたラベル、およびトレーニングデータから学習されたピクセル分布を使用して画像を生成する自己回帰生成アプローチによってさらに増幅される可能性があります。ラベルが敵対的である場合、生成された画像（つまり、「ビュー」）は元の画像から大幅に逸脱し、Argosが検出することを期待する不整合を示します。この目的のために、Argosはまず、一連の再生メカニズムを使用して、攻撃によって引き起こされた画像の視覚的コンテンツとその誤分類されたラベルとの間の不一致を増幅し、次に、再生されたビューが事前設定された程度に逸脱した場合、画像を敵対者として識別します。私たちの実験結果は、アルゴスが6つのよく知られた敵対的攻撃に対する検出精度と堅牢性の両方で2つの代表的な敵対的検出器を大幅に上回っていることを示しています。コードはhttps://github.com/sohaib730/Argos-Adversarial_Detectionで入手できます。

In the evasion attacks against deep neural networks (DNN), the attacker generates adversarial instances that are visually indistinguishable from benign samples and sends them to the target DNN to trigger misclassifications. In this paper, we propose a novel multi-view adversarial image detector, namely Argos, based on a novel observation. That is, there exist two "souls" in an adversarial instance, i.e., the visually unchanged content, which corresponds to the true label, and the added invisible perturbation, which corresponds to the misclassified label. Such inconsistencies could be further amplified through an autoregressive generative approach that generates images with seed pixels selected from the original image, a selected label, and pixel distributions learned from the training data. The generated images (i.e., the "views") will deviate significantly from the original one if the label is adversarial, demonstrating inconsistencies that Argos expects to detect. To this end, Argos first amplifies the discrepancies between the visual content of an image and its misclassified label induced by the attack using a set of regeneration mechanisms and then identifies an image as adversarial if the reproduced views deviate to a preset degree. Our experimental results show that Argos significantly outperforms two representative adversarial detectors in both detection accuracy and robustness against six well-known adversarial attacks. Code is available at: https://github.com/sohaib730/Argos-Adversarial_Detection

updated: Mon Oct 11 2021 15:59:10 GMT+0000 (UTC)

published: Sat Sep 25 2021 23:47:13 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト