Vision Transformer with Deformable Attention

Zhuofan Xia; Xuran Pan; Shiji Song; Li Erran Li; Gao Huang

変形可能な注意を払ったビジョントランスフォーマー

トランスフォーマーは最近、さまざまなビジョンタスクで優れたパフォーマンスを示しています。大きな、時にはグローバルな受容野は、TransformerモデルにCNNの対応するものよりも高い表現力を与えます。それにもかかわらず、受容野を単に拡大することもいくつかの懸念を引き起こします。一方では、たとえばViTで密な注意を使用すると、過剰なメモリと計算コストが発生し、関心領域を超えた無関係な部分によって機能が影響を受ける可能性があります。一方、PVTまたはSwin Transformerで採用されているまばらな注意はデータに依存せず、長距離関係をモデル化する機能を制限する可能性があります。これらの問題を軽減するために、自己注意におけるキーと値のペアの位置がデータに依存する方法で選択される、新しい変形可能な自己注意モジュールを提案します。この柔軟なスキームにより、自己注意モジュールは関連する領域に焦点を合わせ、より有益な機能をキャプチャできます。これに基づいて、画像分類と高密度予測タスクの両方に変形可能な注意を払う一般的なバックボーンモデルであるDeformable AttentionTransformerを紹介します。広範な実験は、私たちのモデルが包括的なベンチマークで一貫して改善された結果を達成することを示しています。コードはhttps://github.com/LeapLabTHU/DATで入手できます。

Transformers have recently shown superior performances on various vision tasks. The large, sometimes even global, receptive field endows Transformer models with higher representation power over their CNN counterparts. Nevertheless, simply enlarging receptive field also gives rise to several concerns. On the one hand, using dense attention e.g., in ViT, leads to excessive memory and computational cost, and features can be influenced by irrelevant parts which are beyond the region of interests. On the other hand, the sparse attention adopted in PVT or Swin Transformer is data agnostic and may limit the ability to model long range relations. To mitigate these issues, we propose a novel deformable self-attention module, where the positions of key and value pairs in self-attention are selected in a data-dependent way. This flexible scheme enables the self-attention module to focus on relevant regions and capture more informative features. On this basis, we present Deformable Attention Transformer, a general backbone model with deformable attention for both image classification and dense prediction tasks. Extensive experiments show that our models achieve consistently improved results on comprehensive benchmarks. Code is available at https://github.com/LeapLabTHU/DAT.

updated: Tue May 24 2022 12:52:37 GMT+0000 (UTC)

published: Mon Jan 03 2022 08:29:01 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト