CT-Net: Channel Tensorization Network for Video Classification

Kunchang Li; Xianhang Li; Yali Wang; Jun Wang; Yu Qiao

CT-Net: ビデオ分類用のチャネルテンソル化ネットワーク

3D コンボリューションはビデオ分類に強力ですが、多くの場合計算コストがかかるため、最近の研究では主に、空間的時間的および/またはチャネルの次元での分解に焦点が当てられています。残念ながら、ほとんどのアプローチは、畳み込み効率と機能の相互作用の十分性との間で望ましいバランスを達成することができません。このため、入力機能のチャネル次元を K 個のサブ次元の乗算として扱うことにより、簡潔で新しいチャネル Tensorization Network (CT-Net) を提案します。一方では、自然に畳み込みを多次元的に因数分解するため、計算負荷が軽くなります。一方、異なるチャネルからの機能の相互作用を効果的に強化し、そのような相互作用の 3D 受容野を徐々に拡大して、分類の精度を高めることができます。さらに、当社の CT モジュールには Tensor Excitation (TE) メカニズムが装備されています。それは、空間的、時間的、チャネル的注意を高次元で活用することを学び、CT モジュールのすべての特徴次元の協調力を向上させることができます。最後に、ResNet を CT-Net として柔軟に適応させます。 Kinetics-400、Something-Something V1 および V2 など、いくつかの難しいビデオベンチマークで広範な実験が行われています。当社の CT-Net は、精度および/または効率の点で、最近の多くの SOTA アプローチよりも優れています。コードとモデルは https://github.com/Andy1621/CT-Net で入手できます。

3D convolution is powerful for video classification but often computationally expensive, recent studies mainly focus on decomposing it on spatial-temporal and/or channel dimensions. Unfortunately, most approaches fail to achieve a preferable balance between convolutional efficiency and feature-interaction sufficiency. For this reason, we propose a concise and novel Channel Tensorization Network (CT-Net), by treating the channel dimension of input feature as a multiplication of K sub-dimensions. On one hand, it naturally factorizes convolution in a multiple dimension way, leading to a light computation burden. On the other hand, it can effectively enhance feature interaction from different channels, and progressively enlarge the 3D receptive field of such interaction to boost classification accuracy. Furthermore, we equip our CT-Module with a Tensor Excitation (TE) mechanism. It can learn to exploit spatial, temporal and channel attention in a high-dimensional manner, to improve the cooperative power of all the feature dimensions in our CT-Module. Finally, we flexibly adapt ResNet as our CT-Net. Extensive experiments are conducted on several challenging video benchmarks, e.g., Kinetics-400, Something-Something V1 and V2. Our CT-Net outperforms a number of recent SOTA approaches, in terms of accuracy and/or efficiency. The codes and models will be available on https://github.com/Andy1621/CT-Net.

updated: Thu Jun 03 2021 05:35:43 GMT+0000 (UTC)

published: Thu Jun 03 2021 05:35:43 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト