Coordinate Attention for Efficient Mobile Network Design

Qibin Hou; Daquan Zhou; Jiashi Feng

効率的なモバイルネットワーク設計のための注意の調整

モバイルネットワーク設計に関する最近の研究は、モデルのパフォーマンスを高めるためのチャネル注意（例えば、圧搾と興奮の注意）の顕著な効果を示していますが、空間的に選択的な注意マップを生成するために重要な位置情報を一般に無視しています。本論文では、位置情報をチャネル注意に埋め込むことによるモバイルネットワークの新しい注意メカニズムを提案する。これを「協調注意」と呼ぶ。 2Dグローバルプーリングを介して特徴テンソルを単一の特徴ベクトルに変換するチャネル注意とは異なり、座標注意は、チャネル注意を2つの空間方向に沿って特徴をそれぞれ集約する2つの1D特徴エンコードプロセスに因数分解します。このようにして、長距離の依存関係を1つの空間方向に沿ってキャプチャでき、その間、正確な位置情報を他の空間方向に沿って保存できます。次に、結果の特徴マップは、方向を認識し、位置に敏感な注意マップのペアに個別にエンコードされます。これらの注意マップは、入力特徴マップに補完的に適用して、対象のオブジェクトの表現を補強できます。私たちの調整の注意は単純であり、MobileNetV2、MobileNeXt、EfficientNetなどの従来のモバイルネットワークに柔軟にプラグインでき、計算のオーバーヘッドはほとんどありません。広範な実験は、私たちの協調的注意がImageNet分類に有益であるだけでなく、さらに興味深いことに、オブジェクト検出やセマンティックセグメンテーションなどのダウンストリームタスクでより適切に動作することを示しています。コードはhttps://github.com/Andrew-Qibin/CoordAttentionで入手できます。

Recent studies on mobile network design have demonstrated the remarkable effectiveness of channel attention (e.g., the Squeeze-and-Excitation attention) for lifting model performance, but they generally neglect the positional information, which is important for generating spatially selective attention maps. In this paper, we propose a novel attention mechanism for mobile networks by embedding positional information into channel attention, which we call "coordinate attention". Unlike channel attention that transforms a feature tensor to a single feature vector via 2D global pooling, the coordinate attention factorizes channel attention into two 1D feature encoding processes that aggregate features along the two spatial directions, respectively. In this way, long-range dependencies can be captured along one spatial direction and meanwhile precise positional information can be preserved along the other spatial direction. The resulting feature maps are then encoded separately into a pair of direction-aware and position-sensitive attention maps that can be complementarily applied to the input feature map to augment the representations of the objects of interest. Our coordinate attention is simple and can be flexibly plugged into classic mobile networks, such as MobileNetV2, MobileNeXt, and EfficientNet with nearly no computational overhead. Extensive experiments demonstrate that our coordinate attention is not only beneficial to ImageNet classification but more interestingly, behaves better in down-stream tasks, such as object detection and semantic segmentation. Code is available at https://github.com/Andrew-Qibin/CoordAttention.

updated: Thu Mar 04 2021 09:18:02 GMT+0000 (UTC)

published: Thu Mar 04 2021 09:18:02 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト