Are Vision Transformers Robust to Patch Perturbations?

Jindong Gu; Volker Tresp; Yao Qin

ビジョントランスフォーマーは、パッチの摂動に対してロバストですか？

Vision Transformer（ViT）の最近の進歩は、画像分類におけるその印象的なパフォーマンスを実証しており、畳み込みニューラルネットワーク（CNN）の有望な代替手段となっています。 CNNとは異なり、ViTは入力画像を一連の画像パッチとして表します。パッチごとの入力画像表現は、次の質問を興味深いものにします。個々の入力画像パッチがCNNと比較して、自然な破損または敵対的な摂動で摂動された場合、ViTはどのように機能しますか？この作業では、パッチごとの摂動に対するビジョントランスフォーマーの堅牢性を研究します。驚いたことに、ビジョントランスフォーマーは、CNNよりも自然に破損したパッチに対して堅牢であるのに対し、敵対的なパッチに対しては脆弱であることがわかりました。さらに、パッチの摂動に対するロバスト性を理解するために、広範な定性的および定量的実験を実施します。自然に破損したパッチに対するViTのより強力な堅牢性と、敵対的なパッチに対するより高い脆弱性は、どちらも注意メカニズムによって引き起こされることを明らかにしました。具体的には、アテンションモデルは、自然に破損したパッチを効果的に無視することで、ビジョントランスフォーマーの堅牢性を向上させるのに役立ちます。ただし、ビジョントランスフォーマーが敵に攻撃されると、注意メカニズムが簡単にだまされて、敵に混乱したパッチに焦点を合わせ、ミスを引き起こす可能性があります。

The recent advances in Vision Transformer (ViT) have demonstrated its impressive performance in image classification, which makes it a promising alternative to Convolutional Neural Network (CNN). Unlike CNNs, ViT represents an input image as a sequence of image patches. The patch-wise input image representation makes the following question interesting: How does ViT perform when individual input image patches are perturbed with natural corruptions or adversarial perturbations, compared to CNNs? In this work, we study the robustness of vision transformers to patch-wise perturbations. Surprisingly, we find that vision transformers are more robust to naturally corrupted patches than CNNs, whereas they are more vulnerable to adversarial patches. Furthermore, we conduct extensive qualitative and quantitative experiments to understand the robustness to patch perturbations. We have revealed that ViT's stronger robustness to natural corrupted patches and higher vulnerability against adversarial patches are both caused by the attention mechanism. Specifically, the attention model can help improve the robustness of vision transformers by effectively ignoring natural corrupted patches. However, when vision transformers are attacked by an adversary, the attention mechanism can be easily fooled to focus more on the adversarially perturbed patches and cause a mistake.

updated: Sat Nov 20 2021 19:00:51 GMT+0000 (UTC)

published: Sat Nov 20 2021 19:00:51 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト