Layer Pruning via Fusible Residual Convolutional Block for Deep Neural Networks

Pengtao Xu; Jian Cao; Fanhua Shang; Wenyu Sun; Pu Li

深層ニューラルネットワークのための可融性残差畳み込みブロックを介した層剪定

リソースが制限されたデバイスにディープ畳み込みニューラルネットワーク（CNN）を展開するために、フィルターと重みの多くのモデルプルーニング方法が開発されましたが、レイヤープルーニングはごくわずかです。ただし、フィルタープルーニングやウェイトプルーニングと比較して、レイヤープルーニングによって得られたコンパクトモデルでは、メモリ内で移動するデータが少ないため、同じFLOPとパラメーター数がプルーニングされる場合、推論時間と実行時のメモリ使用量が少なくなります。本論文では、可融性残差畳み込みブロック（ResConv）を用いた単純な層剪定法を提案する。これは、訓練可能な情報制御パラメータを用いたショートカット接続を単一の畳み込み層に挿入することによって実装される。トレーニングでResConv構造を使用すると、ネットワークの精度を向上させ、ディーププレーンネットワークをトレーニングできます。また、トレーニング後にResConvが通常の畳み込み層に融合されるため、推論プロセス中に追加の計算を追加する必要はありません。レイヤープルーニングでは、ネットワークの畳み込みレイヤーをレイヤースケーリング係数を使用してResConvに変換します。トレーニングプロセスでは、L1正則化を採用してスケーリング係数をスパースにし、重要でないレイヤーを自動的に識別して削除し、レイヤー削減のモデルを作成します。私たちの剪定方法は、さまざまなデータセットで最先端の圧縮と加速の優れたパフォーマンスを実現し、剪定率が低い場合に再トレーニングする必要はありません。たとえば、ResNet-110では、パラメータの55.5％を削除することで、65.5％-FLOPsの削減を実現し、CIFAR-10のトップ1の精度はわずか0.13％低下します。

In order to deploy deep convolutional neural networks (CNNs) on resource-limited devices, many model pruning methods for filters and weights have been developed, while only a few to layer pruning. However, compared with filter pruning and weight pruning, the compact model obtained by layer pruning has less inference time and run-time memory usage when the same FLOPs and number of parameters are pruned because of less data moving in memory. In this paper, we propose a simple layer pruning method using fusible residual convolutional block (ResConv), which is implemented by inserting shortcut connection with a trainable information control parameter into a single convolutional layer. Using ResConv structures in training can improve network accuracy and train deep plain networks, and adds no additional computation during inference process because ResConv is fused to be an ordinary convolutional layer after training. For layer pruning, we convert convolutional layers of network into ResConv with a layer scaling factor. In the training process, the L1 regularization is adopted to make the scaling factors sparse, so that unimportant layers are automatically identified and then removed, resulting in a model of layer reduction. Our pruning method achieves excellent performance of compression and acceleration over the state-of-the-arts on different datasets, and needs no retraining in the case of low pruning rate. For example, with ResNet-110, we achieve a 65.5%-FLOPs reduction by removing 55.5% of the parameters, with only a small loss of 0.13% in top-1 accuracy on CIFAR-10.

updated: Sun Nov 29 2020 12:51:16 GMT+0000 (UTC)

published: Sun Nov 29 2020 12:51:16 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト