RFAConv: Innovating Spatial Attention and Standard Convolutional Operation

Xin Zhang; Chen Liu; Degang Yang; Tingting Song; Yichen Ye; Ke Li; Yingze Song

RFAConv: 空間的注意と標準的な畳み込み演算の革新

空間的注意は、畳み込みニューラルネットワークのパフォーマンスを向上させるために広く使用されています。ただし、これには一定の制限があります。この論文では、空間的注意の有効性に関する新しい視点を提案します。これは、空間的注意メカニズムが畳み込みカーネルパラメーター共有の問題を本質的に解決するというものです。ただし、空間的注意によって生成された注意マップに含まれる情報は、大規模な畳み込みカーネルには十分ではありません。したがって、受容野注意（RFA）と呼ばれる新しい注意メカニズムを提案します。 Convolutional Block Attention Module (CBAM) や Coordinated Attention (CA) などの既存の空間的注意は、空間的特徴のみに焦点を当てており、畳み込みカーネルパラメーター共有の問題に完全には対応していません。対照的に、RFA は受容野の空間的特徴に焦点を当てるだけでなく、大規模な畳み込みカーネルに効果的な注意の重みも提供します。 RFA によって開発された Receptive-Field Attention 畳み込み演算 (RFAConv) は、標準の畳み込み演算を置き換える新しいアプローチを表しています。ネットワークパフォーマンスを大幅に向上させながら、計算コストとパラメーターの増加はほとんど無視できます。 ImageNet-1k、COCO、および VOC データセットで一連の実験を行い、アプローチの優位性を実証しました。特に重要なのは、現在の空間的注意メカニズムについて、空間的特徴から受容野の空間的特徴に焦点を移す時が来たと考えています。このようにして、ネットワークのパフォーマンスをさらに改善し、さらに優れた結果を得ることができます。関連タスクのコードと事前トレーニング済みモデルは、https://github.com/Liuchen1997/RFAConv にあります。

Spatial attention has been widely used to improve the performance of convolutional neural networks. However, it has certain limitations. In this paper, we propose a new perspective on the effectiveness of spatial attention, which is that the spatial attention mechanism essentially solves the problem of convolutional kernel parameter sharing. However, the information contained in the attention map generated by spatial attention is not sufficient for large-size convolutional kernels. Therefore, we propose a novel attention mechanism called Receptive-Field Attention (RFA). Existing spatial attention, such as Convolutional Block Attention Module (CBAM) and Coordinated Attention (CA) focus only on spatial features, which does not fully address the problem of convolutional kernel parameter sharing. In contrast, RFA not only focuses on the receptive-field spatial feature but also provides effective attention weights for large-size convolutional kernels. The Receptive-Field Attention convolutional operation (RFAConv), developed by RFA, represents a new approach to replace the standard convolution operation. It offers nearly negligible increment of computational cost and parameters, while significantly improving network performance. We conducted a series of experiments on ImageNet-1k, COCO, and VOC datasets to demonstrate the superiority of our approach. Of particular importance, we believe that it is time to shift focus from spatial features to receptive-field spatial features for current spatial attention mechanisms. In this way, we can further improve network performance and achieve even better results. The code and pre-trained models for the relevant tasks can be found at https://github.com/Liuchen1997/RFAConv.

updated: Thu Mar 28 2024 12:07:44 GMT+0000 (UTC)

published: Thu Apr 06 2023 16:21:56 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト