Rethinking Local Perception in Lightweight Vision Transformer

Qihang Fan; Huaibo Huang; Jiyang Guan; Ran He

軽量ビジョントランスフォーマーにおけるローカル知覚の再考

ビジョントランスフォーマー (ViT) は、さまざまな視覚タスクに効果的であることが示されています。ただし、モバイルに適したサイズにサイズ変更すると、パフォーマンスが大幅に低下します。したがって、軽量のビジョントランスフォーマーの開発は重要な研究分野となっています。このペーパーでは、コンテキストを認識したローカル拡張機能を活用する軽量のビジョントランスフォーマーである CloFormer を紹介します。 CloFormer は、バニラの畳み込み演算子でよく使用されるグローバルに共有される重みと、アテンションに現れるトークン固有のコンテキスト認識重みとの間の関係を調査し、高周波のローカル情報をキャプチャするための効果的で簡単なモジュールを提案します。 CloFormer では、attention スタイルの畳み込み演算子である AttnConv を導入します。提案された AttnConv は、共有重みを使用してローカル情報を集約し、慎重に設計されたコンテキスト認識重みを展開してローカル機能を強化します。 AttnConv と、プーリングを使用して CloFormer の FLOP を削減するバニラアテンションを組み合わせることで、モデルが高周波情報と低周波情報を認識できるようになります。画像分類、オブジェクト検出、セマンティックセグメンテーションにおいて広範な実験が行われ、CloFormer の優位性が実証されました。コードは https://github.com/qhfan/CloFormer で入手できます。

Vision Transformers (ViTs) have been shown to be effective in various vision tasks. However, resizing them to a mobile-friendly size leads to significant performance degradation. Therefore, developing lightweight vision transformers has become a crucial area of research. This paper introduces CloFormer, a lightweight vision transformer that leverages context-aware local enhancement. CloFormer explores the relationship between globally shared weights often used in vanilla convolutional operators and token-specific context-aware weights appearing in attention, then proposes an effective and straightforward module to capture high-frequency local information. In CloFormer, we introduce AttnConv, a convolution operator in attention's style. The proposed AttnConv uses shared weights to aggregate local information and deploys carefully designed context-aware weights to enhance local features. The combination of the AttnConv and vanilla attention which uses pooling to reduce FLOPs in CloFormer enables the model to perceive high-frequency and low-frequency information. Extensive experiments were conducted in image classification, object detection, and semantic segmentation, demonstrating the superiority of CloFormer. The code is available at https://github.com/qhfan/CloFormer.

updated: Fri May 12 2023 03:51:24 GMT+0000 (UTC)

published: Fri Mar 31 2023 05:25:32 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト