Rethinking Local Perception in Lightweight Vision Transformer

Qihang Fan; Huaibo Huang; Jiyang Guan; Ran He

軽量ビジョントランスフォーマーにおける局所認識の再考

ビジョントランスフォーマー (ViTs) は、さまざまなビジョンタスクで効果的であることが示されています。ただし、モバイルに適したサイズにサイズ変更すると、パフォーマンスが大幅に低下します。したがって、軽量ビジョントランスフォーマーの開発は、重要な研究分野となっています。このホワイトペーパーでは、コンテキストに応じたローカルエンハンスメントを活用する軽量のビジョントランスフォーマーである CloFormer を紹介します。 CloFormer は、一般的な畳み込み演算子でよく使用されるグローバルに共有される重みと、注目に現れるトークン固有のコンテキスト認識重みとの間の関係を調査し、高頻度のローカル情報をキャプチャするための効果的で簡単なモジュールを提案します。 CloFormer では、注意のスタイルの畳み込み演算子である AttnConv を導入します。提案された AttnConv は、共有重みを使用してローカル情報を集約し、慎重に設計されたコンテキスト認識重みを展開してローカル機能を強化します。 CloFormer でプーリングを使用して FLOP を削減する AttnConv とバニラの注意の組み合わせにより、モデルは高頻度および低頻度の情報を認識できます。画像分類、オブジェクト検出、およびセマンティックセグメンテーションで広範な実験が行われ、CloFormer の優位性が実証されました。

Vision Transformers (ViTs) have been shown to be effective in various vision tasks. However, resizing them to a mobile-friendly size leads to significant performance degradation. Therefore, developing lightweight vision transformers has become a crucial area of research. This paper introduces CloFormer, a lightweight vision transformer that leverages context-aware local enhancement. CloFormer explores the relationship between globally shared weights often used in vanilla convolutional operators and token-specific context-aware weights appearing in attention, then proposes an effective and straightforward module to capture high-frequency local information. In CloFormer, we introduce AttnConv, a convolution operator in attention's style. The proposed AttnConv uses shared weights to aggregate local information and deploys carefully designed context-aware weights to enhance local features. The combination of the AttnConv and vanilla attention which uses pooling to reduce FLOPs in CloFormer enables the model to perceive high-frequency and low-frequency information. Extensive experiments were conducted in image classification, object detection, and semantic segmentation, demonstrating the superiority of CloFormer.

updated: Fri Mar 31 2023 05:25:32 GMT+0000 (UTC)

published: Fri Mar 31 2023 05:25:32 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト