(De)Randomized Smoothing for Certifiable Defense against Patch Attacks

Alexander Levine; Soheil Feizi

パッチ攻撃に対する認証可能な防御のための（非）ランダム化平滑化

攻撃者が制限されたサイズの領域内のピクセルを歪める可能性がある画像へのパッチ敵対的攻撃は、物理的な敵対的攻撃の定量的モデルを提供するため、重要な脅威モデルです。このホワイトペーパーでは、特定のイメージとパッチ攻撃サイズを保証するパッチ攻撃に対する認証可能な防御を紹介します。パッチの敵対的な例は存在しません。私たちの方法は、信頼性の高い確率的ロバスト性証明書を提供する、幅広いクラスのランダム化平滑化ロバスト性スキームに関連しています。パッチ攻撃は一般的なスパース攻撃よりも制約が厳しいという事実を利用することで、パッチ攻撃に対して意味のある大きな堅牢性証明書を導き出します。さらに、L_pおよびスパース攻撃に対する平滑化ベースの防御とは対照的に、パッチ攻撃に対する防御方法はランダム化が解除され、改善された決定論的な証明書が生成されます。チェンらによって提案された既存のパッチ認証方法と比較。（2020）は、区間境界伝搬に依存しており、私たちの方法は大幅に高速にトレーニングでき、CIFAR-10で高いクリーンで認定された堅牢な精度を実現し、ImageNetスケールで証明書を提供します。たとえば、CIFAR-10に対する5行5列のパッチ攻撃の場合、既存の最大30.3％の認定精度と比較して、この方法では最大約57.6％の認定精度（約83.8％のクリーン精度の分類器を使用）を達成します。メソッド（約47.8％のクリーン精度の分類器を使用）。私たちの結果は、CIFAR-10およびImageNetに対するパッチ攻撃に対する新しい最先端の認証可能な防御を効果的に確立します。コードはhttps://github.com/alevine0/patchSmoothingで入手できます。

Patch adversarial attacks on images, in which the attacker can distort pixels within a region of bounded size, are an important threat model since they provide a quantitative model for physical adversarial attacks. In this paper, we introduce a certifiable defense against patch attacks that guarantees for a given image and patch attack size, no patch adversarial examples exist. Our method is related to the broad class of randomized smoothing robustness schemes which provide high-confidence probabilistic robustness certificates. By exploiting the fact that patch attacks are more constrained than general sparse attacks, we derive meaningfully large robustness certificates against them. Additionally, in contrast to smoothing-based defenses against L_p and sparse attacks, our defense method against patch attacks is de-randomized, yielding improved, deterministic certificates. Compared to the existing patch certification method proposed by Chiang et al. (2020), which relies on interval bound propagation, our method can be trained significantly faster, achieves high clean and certified robust accuracy on CIFAR-10, and provides certificates at ImageNet scale. For example, for a 5-by-5 patch attack on CIFAR-10, our method achieves up to around 57.6% certified accuracy (with a classifier with around 83.8% clean accuracy), compared to at most 30.3% certified accuracy for the existing method (with a classifier with around 47.8% clean accuracy). Our results effectively establish a new state-of-the-art of certifiable defense against patch attacks on CIFAR-10 and ImageNet. Code is available at https://github.com/alevine0/patchSmoothing.

updated: Tue Dec 08 2020 19:09:10 GMT+0000 (UTC)

published: Tue Feb 25 2020 08:39:46 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト