Less is More: Pay Less Attention in Vision Transformers

Zizheng Pan; Bohan Zhuang; Haoyu He; Jing Liu; Jianfei Cai

少ないほど多い：ビジョントランスフォーマーにあまり注意を払わない

トランスフォーマーは、特にコンピュータービジョンの畳み込みニューラルネットワーク（CNN）の強力な代替手段として、ディープラーニングの主要なアーキテクチャの1つになっています。ただし、以前の作業でのTransformerのトレーニングと推論は、特に高解像度の高密度予測タスクの場合、表現の長いシーケンスにわたる自己注意の2次の複雑さのために、法外に費用がかかる可能性があります。この目的のために、トランスフォーマーの初期の自己注意層が依然としてローカルパターンに焦点を当てており、最近の階層型ビジョントランスフォーマーにわずかな利点をもたらすという事実に基づいて、新しい注目度の低いビジョントランスフォーマー（LIT）を紹介します。具体的には、純粋な多層パーセプトロン（MLP）を使用して初期段階で豊富なローカルパターンをエンコードし、自己注意モジュールを適用してより深い層でより長い依存関係をキャプチャする階層型トランスフォーマーを提案します。さらに、情報パッチを不均一な方法で適応的に融合するために、学習された変形可能なトークンマージモジュールをさらに提案します。提案されたLITは、画像分類、オブジェクト検出、インスタンスセグメンテーションなどの画像認識タスクで有望なパフォーマンスを実現し、多くの視覚タスクの強力なバックボーンとして機能します。コードはhttps://github.com/MonashAI/LITで入手できます。

Transformers have become one of the dominant architectures in deep learning, particularly as a powerful alternative to convolutional neural networks (CNNs) in computer vision. However, Transformer training and inference in previous works can be prohibitively expensive due to the quadratic complexity of self-attention over a long sequence of representations, especially for high-resolution dense prediction tasks. To this end, we present a novel Less attention vIsion Transformer (LIT), building upon the fact that the early self-attention layers in Transformers still focus on local patterns and bring minor benefits in recent hierarchical vision Transformers. Specifically, we propose a hierarchical Transformer where we use pure multi-layer perceptrons (MLPs) to encode rich local patterns in the early stages while applying self-attention modules to capture longer dependencies in deeper layers. Moreover, we further propose a learned deformable token merging module to adaptively fuse informative patches in a non-uniform manner. The proposed LIT achieves promising performance on image recognition tasks, including image classification, object detection and instance segmentation, serving as a strong backbone for many vision tasks. Code is available at: https://github.com/MonashAI/LIT

updated: Sat Oct 23 2021 03:42:16 GMT+0000 (UTC)

published: Sat May 29 2021 05:26:07 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト