Vision Transformers are Robust Learners

Sayak Paul; Pin-Yu Chen

ビジョントランスフォーマーは堅牢な学習者です

複数の自己注意層で構成されるトランスフォーマーは、最先端の（SOTA）標準精度を達成するコンピュータービジョンの最近の進歩を含め、さまざまなデータモダリティに適用可能な一般的な学習プリミティブに向けて強い期待を抱いています。主に未踏のままであるのは、それらの堅牢性の評価と帰属です。この作業では、一般的な破損と摂動、分布の変化、および自然な敵対的な例に対するVision Transformer（ViT）の堅牢性を研究します。堅牢な分類に関する6つの異なるImageNetデータセットを使用して、ViTモデルとSOTA畳み込みニューラルネットワーク（CNN）であるBig-Transferの包括的なパフォーマンス比較を行います。次に、体系的に設計された一連の6つの実験を通じて、ViTが実際により堅牢な学習者である理由を説明するために、定量的および定性的な指標の両方を提供する分析を提示します。たとえば、パラメータが少なく、データセットとトレーニング前の組み合わせが類似している場合、ViTはImageNet-Aで28.10％のトップ1精度を提供します。これは、BiTの同等のバリアントよりも4.3倍高くなります。画像マスキング、フーリエスペクトル感度、および離散コサインエネルギースペクトルの広がりに関する分析により、堅牢性の向上に寄与するViTの興味深い特性が明らかになりました。実験を再現するためのコードは、https：//git.io/J3VO0で入手できます。

Transformers, composed of multiple self-attention layers, hold strong promises toward a generic learning primitive applicable to different data modalities, including the recent breakthroughs in computer vision achieving state-of-the-art (SOTA) standard accuracy. What remains largely unexplored is their robustness evaluation and attribution. In this work, we study the robustness of the Vision Transformer (ViT) against common corruptions and perturbations, distribution shifts, and natural adversarial examples. We use six different diverse ImageNet datasets concerning robust classification to conduct a comprehensive performance comparison of ViT models and SOTA convolutional neural networks (CNNs), Big-Transfer. Through a series of six systematically designed experiments, we then present analyses that provide both quantitative and qualitative indications to explain why ViTs are indeed more robust learners. For example, with fewer parameters and similar dataset and pre-training combinations, ViT gives a top-1 accuracy of 28.10% on ImageNet-A which is 4.3x higher than a comparable variant of BiT. Our analyses on image masking, Fourier spectrum sensitivity, and spread on discrete cosine energy spectrum reveal intriguing properties of ViT attributing to improved robustness. Code for reproducing our experiments is available at https://git.io/J3VO0.

updated: Sat Dec 04 2021 04:28:28 GMT+0000 (UTC)

published: Mon May 17 2021 02:39:22 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト