Omni-Dimensional Dynamic Convolution

Chao Li; Aojun Zhou; Anbang Yao

全次元動的畳み込み

各畳み込み層で単一の静的畳み込みカーネルを学習することは、最新の畳み込みニューラルネットワーク (CNN) の一般的なトレーニングパラダイムです。代わりに、動的畳み込みに関する最近の研究では、入力依存の注意で重み付けされた n 個の畳み込みカーネルの線形結合を学習することで、効率的な推論を維持しながら、軽量 CNN の精度を大幅に向上できることが示されています。ただし、既存の作業では、カーネル空間の 1 つの次元 (畳み込みカーネル数に関して) を通じて畳み込みカーネルに動的プロパティが付与されますが、他の 3 つの次元 (空間サイズ、入力チャネル数、および出力チャネル数に関して) が付与されることがわかります。各畳み込みカーネル) は見落とされます。これに着想を得て、この一連の研究を進めるために、より一般化されたエレガントな動的畳み込み設計である全次元動的畳み込み (ODConv) を提示します。 ODConv は、任意の畳み込み層でカーネル空間の 4 つの次元すべてに沿って畳み込みカーネルの補完的な注意を学習する並列戦略を備えた新しい多次元注意メカニズムを活用します。通常の畳み込みのドロップイン置換として、ODConv は多くの CNN アーキテクチャにプラグインできます。 ImageNet および MS-COCO データセットに関する広範な実験は、ODConv が、軽量および大規模なものの両方を含む、さまざまな一般的な CNN バックボーンに対して確実な精度向上をもたらすことを示しています。 ImageNet データセットの MobivleNetV2|ResNet ファミリ。興味深いことに、機能学習能力が向上したおかげで、単一のカーネルを使用した ODConv は、複数のカーネルを使用した既存の動的畳み込みの対応物と競合するか、それを上回り、余分なパラメーターを大幅に削減できます。さらに、ODConv は、出力機能または畳み込み重みを調整するための他の注意モジュールよりも優れています。

Learning a single static convolutional kernel in each convolutional layer is the common training paradigm of modern Convolutional Neural Networks (CNNs). Instead, recent research in dynamic convolution shows that learning a linear combination of n convolutional kernels weighted with their input-dependent attentions can significantly improve the accuracy of light-weight CNNs, while maintaining efficient inference. However, we observe that existing works endow convolutional kernels with the dynamic property through one dimension (regarding the convolutional kernel number) of the kernel space, but the other three dimensions (regarding the spatial size, the input channel number and the output channel number for each convolutional kernel) are overlooked. Inspired by this, we present Omni-dimensional Dynamic Convolution (ODConv), a more generalized yet elegant dynamic convolution design, to advance this line of research. ODConv leverages a novel multi-dimensional attention mechanism with a parallel strategy to learn complementary attentions for convolutional kernels along all four dimensions of the kernel space at any convolutional layer. As a drop-in replacement of regular convolutions, ODConv can be plugged into many CNN architectures. Extensive experiments on the ImageNet and MS-COCO datasets show that ODConv brings solid accuracy boosts for various prevailing CNN backbones including both light-weight and large ones, e.g., 3.77%~5.71%|1.86%~3.72% absolute top-1 improvements to MobivleNetV2|ResNet family on the ImageNet dataset. Intriguingly, thanks to its improved feature learning ability, ODConv with even one single kernel can compete with or outperform existing dynamic convolution counterparts with multiple kernels, substantially reducing extra parameters. Furthermore, ODConv is also superior to other attention modules for modulating the output features or the convolutional weights.

updated: Fri Sep 16 2022 14:05:38 GMT+0000 (UTC)

published: Fri Sep 16 2022 14:05:38 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト