MSG-Transformer: Exchanging Local Spatial Information by Manipulating Messenger Tokens

Jiemin Fang; Lingxi Xie; Xinggang Wang; Xiaopeng Zhang; Wenyu Liu; Qi Tian

MSG-Transformer：メッセンジャートークンを操作することによるローカル空間情報の交換

トランスフォーマーは、視覚認識用のニューラルネットワークを設計する新しい方法論を提供しました。畳み込みネットワークと比較して、トランスフォーマーは各段階でグローバル機能を参照する機能を備えていますが、アテンションモジュールは、高解像度のビジュアルデータを処理するためのトランスフォーマーのアプリケーションを妨げる高い計算オーバーヘッドをもたらします。このホワイトペーパーでは、効率と柔軟性の競合を緩和することを目的としています。このため、メッセンジャー（MSG）として機能する地域ごとに専用のトークンを提案します。したがって、これらのMSGトークンを操作することにより、領域間で視覚情報を柔軟に交換でき、計算の複雑さが軽減されます。次に、MSGトークンをMSG-Transformerという名前のマルチスケールアーキテクチャに統合します。標準の画像分類とオブジェクト検出では、MSG-Transformerは競争力のあるパフォーマンスを実現し、GPUとCPUの両方での推論が高速化されます。コードはhttps://github.com/hustvl/MSG-Transformerで入手できます。

Transformers have offered a new methodology of designing neural networks for visual recognition. Compared to convolutional networks, Transformers enjoy the ability of referring to global features at each stage, yet the attention module brings higher computational overhead that obstructs the application of Transformers to process high-resolution visual data. This paper aims to alleviate the conflict between efficiency and flexibility, for which we propose a specialized token for each region that serves as a messenger (MSG). Hence, by manipulating these MSG tokens, one can flexibly exchange visual information across regions and the computational complexity is reduced. We then integrate the MSG token into a multi-scale architecture named MSG-Transformer. In standard image classification and object detection, MSG-Transformer achieves competitive performance and the inference on both GPU and CPU is accelerated. Code is available at https://github.com/hustvl/MSG-Transformer.

updated: Wed Dec 01 2021 12:21:25 GMT+0000 (UTC)

published: Mon May 31 2021 17:16:42 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト