Zero-Shot Certified Defense against Adversarial Patches with Vision Transformers

Yuheng Huang; Yuanchun Li

ビジョントランスフォーマーを使用した敵対的なパッチに対するゼロショット認定防御

敵対的パッチ攻撃は、入力画像の制限された領域内のピクセルを任意に変更することにより、機械学習モデルをだますことを目的としています。このような攻撃は、カスタマイズされたオブジェクトをカメラビューに表示することで簡単に実現できるため、物理的な世界に展開されたモデルにとって大きな脅威です。このような攻撃に対する防御は、パッチの恣意性のために困難であり、既存の証明可能な防御は、認定された精度が低いという問題があります。このホワイトペーパーでは、Vision Transformer（ViT）モデルに基づく敵対的なパッチに対するゼロショット認定の防御であるPatchVetoを提案します。 PatchVetoは、必然的に精度を犠牲にする可能性のある敵対的なパッチに抵抗するために堅牢なモデルをトレーニングするのではなく、追加のトレーニングなしで事前にトレーニングされたViTモデルを再利用します。これにより、ViTのアテンションマップを操作するだけで、敵対的なパッチ入力を検出しながら、クリーンな入力で高精度を実現できます。具体的には、各入力は、異なる注意マスクを使用して複数の推論に投票することによってテストされます。少なくとも1つの推論は、敵対的なパッチを除外することが保証されています。マスクされたすべての推論がコンセンサスに達した場合、予測は確実に堅牢になります。これにより、敵対的なパッチが偽陰性なしで検出されることが保証されます。広範な実験により、PatchVetoは高い認定精度（たとえば、2％ピクセルの敵対パッチのImageNetで67.1％）を達成でき、最先端の方法を大幅に上回っていることが示されています。モデルパラメータが直接再利用されるため、クリーンな精度はバニラViTモデル（ImageNetでは81.8％）と同じです。一方、私たちの方法は、マスキング戦略を変更するだけで、さまざまな敵対的なパッチサイズを柔軟に処理できます。

Adversarial patch attack aims to fool a machine learning model by arbitrarily modifying pixels within a restricted region of an input image. Such attacks are a major threat to models deployed in the physical world, as they can be easily realized by presenting a customized object in the camera view. Defending against such attacks is challenging due to the arbitrariness of patches, and existing provable defenses suffer from poor certified accuracy. In this paper, we propose PatchVeto, a zero-shot certified defense against adversarial patches based on Vision Transformer (ViT) models. Rather than training a robust model to resist adversarial patches which may inevitably sacrifice accuracy, PatchVeto reuses a pretrained ViT model without any additional training, which can achieve high accuracy on clean inputs while detecting adversarial patched inputs by simply manipulating the attention map of ViT. Specifically, each input is tested by voting over multiple inferences with different attention masks, where at least one inference is guaranteed to exclude the adversarial patch. The prediction is certifiably robust if all masked inferences reach consensus, which ensures that any adversarial patch would be detected with no false negative. Extensive experiments have shown that PatchVeto is able to achieve high certified accuracy (e.g. 67.1% on ImageNet for 2%-pixel adversarial patches), significantly outperforming state-of-the-art methods. The clean accuracy is the same as vanilla ViT models (81.8% on ImageNet) since the model parameters are directly reused. Meanwhile, our method can flexibly handle different adversarial patch sizes by simply changing the masking strategy.

updated: Fri Nov 19 2021 23:45:23 GMT+0000 (UTC)

published: Fri Nov 19 2021 23:45:23 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト