Mind the box: l_1-APGD for sparse adversarial attacks on image classifiers

Francesco Croce; Matthias Hein

ボックスに注意してください：l_1-画像分類子に対するまばらな敵対的攻撃のAPGD

画像ドメイン[0,1] ^ dも考慮すると、確立されたl_1投影勾配降下（PGD）攻撃は、効果的な脅威モデルがl_1ボールとの交差であるとは見なされないため、最適ではないことを示します。 [0,1] ^ d。この効果的な脅威モデルの最急降下ステップの予想されるスパース性を調査し、このセットへの正確な射影が計算上実行可能であり、パフォーマンスが向上することを示します。さらに、反復の予算が少ない場合でも非常に効果的なPGDの適応形式を提案します。結果として得られるl_1-APGDは強力なホワイトボックス攻撃であり、以前の作業がl_1-robustnessを過大評価していたことを示しています。敵対者の訓練にl_1-APGDを使用すると、SOTAl_1-robustnessを備えた堅牢な分類器が得られます。最後に、l_1-APGDとl_1へのSquare Attackの適応をl_1-AutoAttackに組み合わせます。これは、[0,1] ^ dと交差するl_1-ballの脅威モデルの敵対的なロバスト性を確実に評価する攻撃のアンサンブルです。

We show that when taking into account also the image domain [0,1]^d, established l_1-projected gradient descent (PGD) attacks are suboptimal as they do not consider that the effective threat model is the intersection of the l_1-ball and [0,1]^d. We study the expected sparsity of the steepest descent step for this effective threat model and show that the exact projection onto this set is computationally feasible and yields better performance. Moreover, we propose an adaptive form of PGD which is highly effective even with a small budget of iterations. Our resulting l_1-APGD is a strong white-box attack showing that prior works overestimated their l_1-robustness. Using l_1-APGD for adversarial training we get a robust classifier with SOTA l_1-robustness. Finally, we combine l_1-APGD and an adaptation of the Square Attack to l_1 into l_1-AutoAttack, an ensemble of attacks which reliably assesses adversarial robustness for the threat model of l_1-ball intersected with [0,1]^d.

updated: Fri Dec 03 2021 12:27:03 GMT+0000 (UTC)

published: Mon Mar 01 2021 18:53:32 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト