On Trace of PGD-Like Adversarial Attacks

Mo Zhou; Vishal M. Patel

PGDのような敵対的攻撃の痕跡について

敵対的な攻撃は、ディープラーニングアプリケーションの安全性とセキュリティ上の懸念を引き起こします。しかし、ほとんど気付かないうちに、強力なPGDのような攻撃は、敵対的な例に強い痕跡を残す可能性があります。攻撃はネットワークの局所的な線形性をトリガーするため、良性の例と敵対的な例では、ネットワークがさまざまな程度の線形性で動作すると推測されます。したがって、線形性の程度を示すために、入力の周りのモデルの勾配の一貫性を反映するように、敵対的応答特性（ARC）機能を構築します。特定の条件下では、良性の例から敵対的な例へと徐々に変化するパターンを示します。これは、後でSequel Attack Effect（SAE）につながるためです。 ARC機能は、二項分類器を使用した情報に基づく攻撃の検出（摂動の大きさは既知）、または順序回帰を使用した非情報に基づく攻撃の検出（摂動の大きさは不明）に使用できます。 PGDのような攻撃に対するSAEの独自性により、ARCは、損失関数や後処理防御としてのグラウンドトゥルースラベルなどの他の攻撃の詳細を推測することもできます。定性的および定量的評価は、ドメインシフトにもかかわらずPGDのような攻撃の間でかなり一般化されたResNet-18およびImageNet w/ResNet-152およびSwinT-B-IN1Kを備えたCIFAR-10でのARC機能の有効性を明らかにします。私たちの方法は、直感的で、軽量で、邪魔にならず、データを必要としません。

Adversarial attacks pose safety and security concerns for deep learning applications. Yet largely imperceptible, a strong PGD-like attack may leave strong trace in the adversarial example. Since attack triggers the local linearity of a network, we speculate network behaves in different extents of linearity for benign examples and adversarial examples. Thus, we construct Adversarial Response Characteristics (ARC) features to reflect the model's gradient consistency around the input to indicate the extent of linearity. Under certain conditions, it shows a gradually varying pattern from benign example to adversarial example, as the later leads to Sequel Attack Effect (SAE). ARC feature can be used for informed attack detection (perturbation magnitude is known) with binary classifier, or uninformed attack detection (perturbation magnitude is unknown) with ordinal regression. Due to the uniqueness of SAE to PGD-like attacks, ARC is also capable of inferring other attack details such as loss function, or the ground-truth label as a post-processing defense. Qualitative and quantitative evaluations manifest the effectiveness of ARC feature on CIFAR-10 w/ ResNet-18 and ImageNet w/ ResNet-152 and SwinT-B-IN1K with considerable generalization among PGD-like attacks despite domain shift. Our method is intuitive, light-weighted, non-intrusive, and data-undemanding.

updated: Thu May 19 2022 14:26:50 GMT+0000 (UTC)

published: Thu May 19 2022 14:26:50 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト