Attack Agnostic Detection of Adversarial Examples via Random Subspace Analysis

Nathan Drenkow; Neil Fendley; Philippe Burlina

ランダム部分空間分析による敵対的な例の攻撃にとらわれない検出

敵対的な攻撃の検出はかなりの注目を集めていますが、2つの観点から根本的に挑戦的な問題のままです。まず、脅威モデルは明確に定義できますが、攻撃者の戦略はこれらの制約の範囲内で大きく異なる可能性があります。したがって、現在のほとんどの検出アプローチとは対照的に、検出はオープンセットの問題と見なす必要があります。これらの方法は、閉集合ビューを取り、バイナリ検出器をトレーニングするため、検出器のトレーニング中に見られる攻撃に向けて検出をバイアスします。第二に、限られた情報がテスト時に利用可能であり、通常、ラベルや画像の基礎となるコンテンツなどの迷惑要因によって混乱します。ランダム部分空間分析に基づく新しい戦略を介してこれらの課題に対処します。ランダム投影のプロパティを利用して、さまざまな部分空間のセット全体でクリーンで敵対的な例の動作を特徴付ける手法を紹介します。モデルアクティベーションの自己一貫性（または非一貫性）を活用して、敵対的な例からクリーンを識別します。パフォーマンス評価は、私たちの手法（AUC∈[0.92,0.98]）が競合する検出戦略（AUC∈[0.30,0.79]）よりも優れている一方で、攻撃戦略（標的型/非標的型攻撃の両方）に完全にとらわれないことを示しています。また、このパフォーマンスを達成するために必要なキャリブレーションデータ（クリーンな例のみで構成されている）は、競合するアプローチよりも大幅に少なくて済みます。

Whilst adversarial attack detection has received considerable attention, it remains a fundamentally challenging problem from two perspectives. First, while threat models can be well-defined, attacker strategies may still vary widely within those constraints. Therefore, detection should be considered as an open-set problem, standing in contrast to most current detection approaches. These methods take a closed-set view and train binary detectors, thus biasing detection toward attacks seen during detector training. Second, limited information is available at test time and typically confounded by nuisance factors including the label and underlying content of the image. We address these challenges via a novel strategy based on random subspace analysis. We present a technique that utilizes properties of random projections to characterize the behavior of clean and adversarial examples across a diverse set of subspaces. The self-consistency (or inconsistency) of model activations is leveraged to discern clean from adversarial examples. Performance evaluations demonstrate that our technique (AUC∈[0.92, 0.98]) outperforms competing detection strategies (AUC∈[0.30,0.79]), while remaining truly agnostic to the attack strategy (for both targeted/untargeted attacks). It also requires significantly less calibration data (composed only of clean examples) than competing approaches to achieve this performance.

updated: Wed Nov 03 2021 15:07:58 GMT+0000 (UTC)

published: Fri Dec 11 2020 15:02:28 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト