DRTAM: Dual Rank-1 Tensor Attention Module

Hanxing Chi; Baihong Lin; Jun Hu; Liang Wang

DRTAM：デュアルランク1テンソル注意モジュール

最近、注意メカニズムがコンピュータビジョンで広く調査されていますが、大規模ネットワークとモバイルネットワークの両方で優れたパフォーマンスを示すものはほとんどありません。この論文は、フィードフォワード畳み込みニューラルネットワークのための新しい残余注意学習誘導注意モジュールであるデュアルランク1テンソル注意モジュール（DRTAM）を提案します。 3D特徴テンソルマップが与えられると、DRTAMは最初に3つの軸に沿って3つの2D特徴記述子を生成します。次に、3つの記述子を使用して、DRTAMは2つのランク1テンソル注意マップ（初期注意マップと補完注意マップ）を順次推測し、それらを組み合わせて入力特徴マップに乗算し、適応特徴の改良を行います（図1（c）を参照）。 2つのアテンションマップを生成するために、DRTAMはランク1テンソルアテンションモジュール（RTAM）と残差記述子抽出モジュール（RDEM）を導入します。RTAMは各2D特徴記述子をいくつかのチャンクに分割し、ランク1テンソルアテンションマップの3つの因子ベクトルを生成します。各チャンクにストリッププーリングを採用して、ローカルおよび長距離のコンテキスト情報をそれぞれ3次元に沿ってキャプチャできるようにします。 RDEMは、残差特徴の3つの2D特徴記述子を生成して、初期注意マップの3つの因子ベクトルと入力特徴の3つの記述子を使用して、補数注意マップを生成します。 ImageNet-1K、MS COCO、およびPASCAL VOCに関する広範な実験結果は、DRTAMが他の最先端のアテンションモジュールと比較して、大規模ネットワークとモバイルネットワークの両方で競争力のあるパフォーマンスを達成することを示しています。

Recently, attention mechanisms have been extensively investigated in computer vision, but few of them show excellent performance on both large and mobile networks. This paper proposes Dual Rank-1 Tensor Attention Module (DRTAM), a novel residual-attention-learning-guided attention module for feed-forward convolutional neural networks. Given a 3D feature tensor map, DRTAM firstly generates three 2D feature descriptors along three axes. Then, using three descriptors, DRTAM sequentially infers two rank-1 tensor attention maps, the initial attention map and the complement attention map, combines and multiplied them to the input feature map for adaptive feature refinement(see Fig.1(c)). To generate two attention maps, DRTAM introduces rank-1 tensor attention module (RTAM) and residual descriptors extraction module (RDEM): RTAM divides each 2D feature descriptors into several chunks, and generate three factor vectors of a rank-1 tensor attention map by employing strip pooling on each chunk so that local and long-range contextual information can be captured along three dimension respectively; RDEM generates three 2D feature descriptors of the residual feature to produce the complement attention map, using three factor vectors of the initial attention map and three descriptors of the input feature. Extensive experimental results on ImageNet-1K, MS COCO and PASCAL VOC demonstrate that DRTAM achieves competitive performance on both large and mobile networks compare with other state-of-the-art attention modules.

updated: Fri Mar 11 2022 12:52:44 GMT+0000 (UTC)

published: Fri Mar 11 2022 12:52:44 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト