Deeply Shared Filter Bases for Parameter-Efficient Convolutional Neural Networks

Woochul Kang; Daeyeon Kim

パラメータ効率の高い畳み込みニューラルネットワークのための深く共有されたフィルタベース

最新の畳み込みニューラルネットワーク（CNN）には、大規模な同一の畳み込みブロックがあるため、パラメーターの量を減らすために、これらのブロック間でパラメーターを再帰的に共有することが提案されています。ただし、パラメーターの単純な共有には、表現力の制限や再帰的に共有されるパラメーターの勾配の消失/爆発の問題など、多くの課題があります。この論文では、再帰的畳み込みブロックの設計とトレーニング方法を提示します。この方法では、トレーニング中の勾配の消失/爆発の問題を効果的に回避しながら、再帰的に共有可能な部分またはフィルターベースを分離して学習します。扱いにくい消失/爆発勾配問題は、フィルター基底正規直交の要素を適用することで制御できることを示し、提案された直交正則化がトレーニング中の勾配の流れを改善することを経験的に示します。画像分類とオブジェクト検出に関する実験結果は、以前のパラメータ共有アプローチとは異なり、私たちのアプローチはパラメータを保存するためにパフォーマンスを犠牲にせず、パラメータ化された対応するネットワークを一貫して上回っていることを示しています。この優れたパフォーマンスは、提案された再帰的畳み込みブロック設計と直交正則化がパフォーマンスの低下を防ぐだけでなく、かなりの量のパラメーターが再帰的に共有される一方で、表現能力を一貫して改善することを示しています。

Modern convolutional neural networks (CNNs) have massive identical convolution blocks, and, hence, recursive sharing of parameters across these blocks has been proposed to reduce the amount of parameters. However, naive sharing of parameters poses many challenges such as limited representational power and the vanishing/exploding gradients problem of recursively shared parameters. In this paper, we present a recursive convolution block design and training method, in which a recursively shareable part, or a filter basis, is separated and learned while effectively avoiding the vanishing/exploding gradients problem during training. We show that the unwieldy vanishing/exploding gradients problem can be controlled by enforcing the elements of the filter basis orthonormal, and empirically demonstrate that the proposed orthogonality regularization improves the flow of gradients during training. Experimental results on image classification and object detection show that our approach, unlike previous parameter-sharing approaches, does not trade performance to save parameters and consistently outperforms overparameterized counterpart networks. This superior performance demonstrates that the proposed recursive convolution block design and the orthogonality regularization not only prevent performance degradation, but also consistently improve the representation capability while a significant amount of parameters are recursively shared.

updated: Sun Nov 21 2021 09:53:04 GMT+0000 (UTC)

published: Tue Jun 09 2020 06:09:42 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト