Gated Channel Transformation for Visual Recognition

Zongxin Yang; Linchao Zhu; Yu Wu; Yi Yang

視覚認識のためのゲートチャネル変換

この作業では、深い畳み込みニューラルネットワークによる視覚認識に一般的に適用可能な変換ユニットを提案します。この変換は、説明可能な制御変数を使用してチャネル関係を明示的にモデル化します。これらの変数は、競争または協調のニューロンの動作を決定し、より正確な認識に向けて畳み込み重みと一緒に最適化されます。 Squeeze-and-Excitation（SE）ネットワークでは、チャネル関係は完全に接続されたレイヤーによって暗黙的に学習され、SEブロックはブロックレベルで統合されます。代わりに、パラメーターの数と計算の複雑さを減らすために、チャネル正規化レイヤーを導入します。この軽量のレイヤーには単純なl2正規化が組み込まれており、追加のパラメーターをあまり増やすことなくオペレーターレベルに適用できる変換ユニットを使用できます。広範な実験により、多くのビジョンタスク、つまり、ImageNetでの画像分類、COCOでのオブジェクト検出とインスタンスセグメンテーション、Kineticsでのビデオ分類などの明確なマージンを備えた私たちのユニットの効果が実証されています。

In this work, we propose a generally applicable transformation unit for visual recognition with deep convolutional neural networks. This transformation explicitly models channel relationships with explainable control variables. These variables determine the neuron behaviors of competition or cooperation, and they are jointly optimized with the convolutional weight towards more accurate recognition. In Squeeze-and-Excitation (SE) Networks, the channel relationships are implicitly learned by fully connected layers, and the SE block is integrated at the block-level. We instead introduce a channel normalization layer to reduce the number of parameters and computational complexity. This lightweight layer incorporates a simple l2 normalization, enabling our transformation unit applicable to operator-level without much increase of additional parameters. Extensive experiments demonstrate the effectiveness of our unit with clear margins on many vision tasks, i.e., image classification on ImageNet, object detection and instance segmentation on COCO, video classification on Kinetics.

updated: Fri Mar 27 2020 10:08:39 GMT+0000 (UTC)

published: Wed Sep 25 2019 14:26:32 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト