ParCNetV2: Oversized Kernel with Enhanced Attention

Ruihan Xu; Haokui Zhang; Wenze Hu; Shiliang Zhang; Xiaoyu Wang

ParCNetV2: 注意力が強化された特大のカーネル

トランスフォーマーは、さまざまなコンピュータービジョンタスクで大きな可能性を示しています。変圧器から設計概念を借用することで、多くの研究が CNN に革命をもたらし、驚くべき結果を示しました。この論文は、この一連の研究に該当します。具体的には、新しい畳み込みニューラルネットワーク、ParCNetV2 を提案します。これは、位置認識循環畳み込み (ParCNet) を特大の畳み込みと分岐ゲートユニットで拡張して、注意力を高めます。オーバーサイズの畳み込みでは、入力サイズが 2 倍のカーネルを使用して、グローバルな受容野を介して長距離の依存関係をモデル化します。同時に、畳み込みカーネルからシフト不変のプロパティを削除することにより、暗黙的な位置エンコーディングを実現します。つまり、カーネルサイズが入力サイズの 2 倍の場合、異なる空間位置での有効なカーネルは異なります。分岐ゲートユニットは、トランスフォーマーの自己注意と同様の注意メカニズムを実装します。これは、2 つの分岐の要素ごとの乗算によって適用されます。1 つは特徴変換として機能し、もう 1 つは注目の重みとして機能します。さらに、初期段階と後期段階の畳み込みブロックの設計を統一するために、均一なローカル/グローバル畳み込みブロックを導入します。広範な実験により、CNN と変換器を組み合わせた他の畳み込みニューラルネットワークやハイブリッドモデルよりも優れた方法が実証されています。コードが公開されます。

Transformers have shown great potential in various computer vision tasks. By borrowing design concepts from transformers, many studies revolutionized CNNs and showed remarkable results. This paper falls in this line of studies. Specifically, we propose a new convolutional neural network, ParCNetV2, that extends position-aware circular convolution (ParCNet) with oversized convolutions and bifurcate gate units to enhance attention. The oversized convolution employs a kernel with twice the input size to model long-range dependencies through a global receptive field. Simultaneously, it achieves implicit positional encoding by removing the shift-invariant property from convolution kernels, i.e., the effective kernels at different spatial locations are different when the kernel size is twice as large as the input size. The bifurcate gate unit implements an attention mechanism similar to self-attention in transformers. It is applied through element-wise multiplication of the two branches, one serves as feature transformation while the other serves as attention weights. Additionally, we introduce a uniform local-global convolution block to unify the design of the early and late stage convolution blocks. Extensive experiments demonstrate the superiority of our method over other convolutional neural networks and hybrid models that combine CNNs and transformers. Code will be released.

updated: Thu Mar 16 2023 02:38:06 GMT+0000 (UTC)

published: Mon Nov 14 2022 07:22:55 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト