CAT: Cross Attention in Vision Transformer

Hezheng Lin; Xing Cheng; Xiangyu Wu; Fan Yang; Dong Shen; Zhongyuan Wang; Qing Song; Wei Yuan

CAT：VisionTransformerのクロスアテンション

TransformerがNLPで広く使用されて以来、CVでのTransformerの可能性が認識され、多くの新しいアプローチに影響を与えています。ただし、画像のトークン化後に単語トークンを Transformer の画像パッチに置き換えるために必要な計算は膨大 (たとえば、ViT) であり、モデルのトレーニングと推論のボトルネックになります。この論文では、クロスアテンションと呼ばれるトランスフォーマーの新しいアテンションメカニズムを提案します。これは、画像全体ではなくイメージパッチ内でアテンションを交互に切り替えてローカル情報をキャプチャし、シングルチャネルフィーチャマップから分割されたイメージパッチ間でアテンションを適用してグローバル情報をキャプチャします。。どちらの操作も、Transformerの標準的な自己注意よりも計算が少なくなります。アテンション内部パッチとパッチ間を交互に適用することにより、クロスアテンションを実装して、より低い計算コストでパフォーマンスを維持し、他のビジョンタスク用のクロスアテンショントランスフォーマー（CAT）と呼ばれる階層ネットワークを構築します。私たちの基本モデルは、ImageNet-1Kで最先端を実現し、COCOおよびADE20Kでの他の方法のパフォーマンスを向上させ、私たちのネットワークが一般的なバックボーンとして機能する可能性があることを示しています。コードとモデルはhttps://github.com/linhezheng19/CATで入手できます。

Since Transformer has found widespread use in NLP, the potential of Transformer in CV has been realized and has inspired many new approaches. However, the computation required for replacing word tokens with image patches for Transformer after the tokenization of the image is vast(e.g., ViT), which bottlenecks model training and inference. In this paper, we propose a new attention mechanism in Transformer termed Cross Attention, which alternates attention inner the image patch instead of the whole image to capture local information and apply attention between image patches which are divided from single-channel feature maps capture global information. Both operations have less computation than standard self-attention in Transformer. By alternately applying attention inner patch and between patches, we implement cross attention to maintain the performance with lower computational cost and build a hierarchical network called Cross Attention Transformer(CAT) for other vision tasks. Our base model achieves state-of-the-arts on ImageNet-1K, and improves the performance of other methods on COCO and ADE20K, illustrating that our network has the potential to serve as general backbones. The code and models are available at https://github.com/linhezheng19/CAT.

updated: Thu Jun 10 2021 14:38:32 GMT+0000 (UTC)

published: Thu Jun 10 2021 14:38:32 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト