Involution: Inverting the Inherence of Convolution for Visual Recognition

Duo Li; Jie Hu; Changhu Wang; Xiangtai Li; Qi She; Lei Zhu; Tong Zhang; Qifeng Chen

インボリューション: 視覚認識における畳み込みの内在性を反転する

畳み込みは、現代のニューラルネットワークの中核的な要素であり、視覚における深層学習の急増のきっかけとなった。本研究では、視覚タスクのための標準的な畳み込みの本質的な原理、特に、場所に依存せず、チャンネルに特異的であることを再考する。その代わりに、前述の畳み込みの設計原理を反転させることで、深層ニューラルネットワークのための新しい原子演算(インボリューションと呼ぶ)を提示する。さらに、最近よく使われている自己注目演算子を解明し、過度に複雑なインスタンス化として我々のインボリューションの一群に包摂する。提案されたインボリューション演算子は、新世代の視覚認識用ニューラルネットワークを構築するための基本的な要素として活用でき、ImageNetの分類、COCOの検出とセグメンテーション、Cityscapesのセグメンテーションなど、いくつかの一般的なベンチマークにおいて、さまざまな深層学習モデルを強化することができる。今回開発したインボリューションベースのモデルは、ResNet-50を用いたコンボリューショナルベースラインの性能を、トップ1精度で最大1.6％、バウンディングボックスAPで2.5％と2.4％、平均IoUで4.7％向上させるとともに、上記のベンチマークにおいて計算コストをそれぞれ66％、65％、72％、57％に圧縮した。すべてのタスクのコードと事前学習済みモデルは、https://github.com/d-li14/involution で公開されている。

Convolution has been the core ingredient of modern neural networks, triggering the surge of deep learning in vision. In this work, we rethink the inherent principles of standard convolution for vision tasks, specifically spatial-agnostic and channel-specific. Instead, we present a novel atomic operation for deep neural networks by inverting the aforementioned design principles of convolution, coined as involution. We additionally demystify the recent popular self-attention operator and subsume it into our involution family as an over-complicated instantiation. The proposed involution operator could be leveraged as fundamental bricks to build the new generation of neural networks for visual recognition, powering different deep learning models on several prevalent benchmarks, including ImageNet classification, COCO detection and segmentation, together with Cityscapes segmentation. Our involution-based models improve the performance of convolutional baselines using ResNet-50 by up to 1.6% top-1 accuracy, 2.5% and 2.4% bounding box AP, and 4.7% mean IoU absolutely while compressing the computational cost to 66%, 65%, 72%, and 57% on the above benchmarks, respectively. Code and pre-trained models for all the tasks are available at https://github.com/d-li14/involution.

updated: Sun Apr 11 2021 12:30:11 GMT+0000 (UTC)

published: Wed Mar 10 2021 18:40:46 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト