ParCNetV2: Oversized Kernel with Enhanced Attention

Ruihan Xu; Haokui Zhang; Wenze Hu; Shiliang Zhang; Xiaoyu Wang

ParCNetV2: 注意力が強化された特大のカーネル

トランスフォーマーは、さまざまなコンピュータービジョンタスクで大きな成功を収めています。変圧器から設計概念を借用することで、多くの研究が CNN に革命をもたらし、驚くべき結果を示しました。この論文は、この一連の研究に該当します。より具体的には、ParCNetV2 という名前の畳み込みニューラルネットワークアーキテクチャを導入します。これは、位置認識循環畳み込み (ParCNet) を特大の畳み込みで拡張し、分岐ゲートユニットによって注意を強化します。特大の畳み込みでは、入力サイズの 2 倍のカーネルを使用して、グローバルな受容野を介して長距離依存関係をモデル化します。同時に、畳み込みカーネルからシフト不変のプロパティを削除することにより、暗黙的な位置エンコーディングを実現します。つまり、カーネルサイズが入力サイズの 2 倍の場合、異なる空間位置での有効なカーネルは異なります。分岐ゲートユニットは、トランスフォーマーの自己注意と同様の注意メカニズムを実装します。入力を 2 つの分岐に分割します。1 つは特徴変換として機能し、もう 1 つはアテンションウェイトとして機能します。注意は、2 つの分岐の要素ごとの乗算によって適用されます。さらに、初期段階と後期段階の畳み込みブロックの設計を統一するために、統一されたローカル/グローバル畳み込みブロックを導入します。広範な実験により、私たちの方法が他の純粋な畳み込みニューラルネットワークや、CNN とトランスフォーマーをハイブリッド化したニューラルネットワークよりも優れていることが実証されています。

Transformers have achieved tremendous success in various computer vision tasks. By borrowing design concepts from transformers, many studies revolutionized CNNs and showed remarkable results. This paper falls in this line of studies. More specifically, we introduce a convolutional neural network architecture named ParCNetV2, which extends position-aware circular convolution (ParCNet) with oversized convolutions and strengthens attention through bifurcate gate units. The oversized convolution utilizes a kernel with 2× the input size to model long-range dependencies through a global receptive field. Simultaneously, it achieves implicit positional encoding by removing the shift-invariant property from convolutional kernels, i.e., the effective kernels at different spatial locations are different when the kernel size is twice as large as the input size. The bifurcate gate unit implements an attention mechanism similar to self-attention in transformers. It splits the input into two branches, one serves as feature transformation while the other serves as attention weights. The attention is applied through element-wise multiplication of the two branches. Besides, we introduce a unified local-global convolution block to unify the design of the early and late stage convolutional blocks. Extensive experiments demonstrate that our method outperforms other pure convolutional neural networks as well as neural networks hybridizing CNNs and transformers.

updated: Mon Mar 13 2023 08:57:52 GMT+0000 (UTC)

published: Mon Nov 14 2022 07:22:55 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト