Robustness Out of the Box: Compositional Representations Naturally Defend Against Black-Box Patch Attacks

Christian Cosgrove; Adam Kortylewski; Chenglin Yang; Alan Yuille

箱から出してすぐに使える堅牢性：構成表現は、ブラックボックスパッチ攻撃から自然に防御します

パッチベースの敵対的攻撃は、誤分類を引き起こす、知覚できるが局所的な変化を入力にもたらします。知覚できない攻撃に対する防御は進んでいますが、パッチベースの攻撃にどのように抵抗できるかは不明です。この作業では、ブラックボックスパッチ攻撃から防御するための2つの異なるアプローチを研究します。まず、知覚できない攻撃に対して成功する敵対的なトレーニングは、最先端のロケーション最適化パッチ攻撃に対しては効果が限られていることを示します。第二に、自然な閉塞に対する生来のロバスト性につながるパーツベースの表現を持つ構成的ディープネットワークは、敵対的なトレーニングなしで、PASCAL3D +およびドイツの交通標識認識ベンチマークに対するパッチ攻撃に対してロバストであることがわかります。さらに、構成モデルの堅牢性は、敵対的に訓練された標準モデルの堅牢性を大幅に上回っています。ただし、GTSRBでは、きめ細かい違いがある類似の交通標識を区別するのに問題があることがわかります。パーツベースの微調整を導入することでこの制限を克服し、きめ細かい認識を向上させます。構成表現を活用することにより、これは、高価な敵対者のトレーニングなしでブラックボックスパッチ攻撃から防御する最初の作品です。この防御は、敵対的なパッチを見つけて無視できるため、敵対的なトレーニングよりも堅牢で、より解釈しやすくなっています。

Patch-based adversarial attacks introduce a perceptible but localized change to the input that induces misclassification. While progress has been made in defending against imperceptible attacks, it remains unclear how patch-based attacks can be resisted. In this work, we study two different approaches for defending against black-box patch attacks. First, we show that adversarial training, which is successful against imperceptible attacks, has limited effectiveness against state-of-the-art location-optimized patch attacks. Second, we find that compositional deep networks, which have part-based representations that lead to innate robustness to natural occlusion, are robust to patch attacks on PASCAL3D+ and the German Traffic Sign Recognition Benchmark, without adversarial training. Moreover, the robustness of compositional models outperforms that of adversarially trained standard models by a large margin. However, on GTSRB, we observe that they have problems discriminating between similar traffic signs with fine-grained differences. We overcome this limitation by introducing part-based finetuning, which improves fine-grained recognition. By leveraging compositional representations, this is the first work that defends against black-box patch attacks without expensive adversarial training. This defense is more robust than adversarial training and more interpretable because it can locate and ignore adversarial patches.

updated: Tue Dec 01 2020 15:04:23 GMT+0000 (UTC)

published: Tue Dec 01 2020 15:04:23 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト