Trainability Preserving Neural Structured Pruning

Huan Wang; Yun Fu

神経構造の剪定を維持する訓練可能性

最近のいくつかの研究では、学習率の微調整がニューラルネットワーク構造化剪定の最終的なパフォーマンスにとって重要であることが経験的にわかっています。さらなる調査により、ネットワークのトレーニング可能性は、その回答を剪定することによって壊れていることがわかりました。したがって、微調整する前にトレーニング可能性を回復する緊急の必要性が求められています。既存の試みは、改善された訓練可能性のために動的等長を達成するために重み直交化を利用することを提案している。ただし、これらは線形MLPネットワークでのみ機能します。トレーニング可能性を維持または回復し、最新のディープネットワークにスケーラブルなフィルタープルーニング方法を開発する方法は、とらえどころのないままです。この論文では、スパース化中に訓練可能性を効果的に維持できる正則化ベースの構造化剪定方法である訓練可能性保存剪定（TPP）を提示します。具体的には、TPPは、畳み込みカーネルのグラム行列を正規化して、プルーニングされたフィルターを保持されたフィルターから非相関化します。畳み込み層に加えて、トレーニング性をよりよく維持するためにBNパラメーターを正則化することも提案します。経験的に、TPPは、線形MLPネットワークでのグラウンドトゥルース動的アイソメトリ回復方法と競合する可能性があります。非線形ネットワーク（ResNet56 / VGG19、CIFARデータセット）では、他の対応するソリューションを大幅に上回っています。さらに、TPPはImageNet上の最新のディープネットワーク（ResNet）とも効果的に連携し、多くの最高のフィルタープルーニング方法と比較して有望なパフォーマンスを提供します。私たちの知る限り、これは大規模なディープニューラルネットワークの剪定中にトレーニング可能性を効果的に維持する最初のアプローチです。

Several recent works empirically find finetuning learning rate is critical to the final performance in neural network structured pruning. Further researches find that the network trainability broken by pruning answers for it, thus calling for an urgent need to recover trainability before finetuning. Existing attempts propose to exploit weight orthogonalization to achieve dynamical isometry for improved trainability. However, they only work for linear MLP networks. How to develop a filter pruning method that maintains or recovers trainability and is scalable to modern deep networks remains elusive. In this paper, we present trainability preserving pruning (TPP), a regularization-based structured pruning method that can effectively maintain trainability during sparsification. Specifically, TPP regularizes the gram matrix of convolutional kernels so as to de-correlate the pruned filters from the kept filters. Beside the convolutional layers, we also propose to regularize the BN parameters for better preserving trainability. Empirically, TPP can compete with the ground-truth dynamical isometry recovery method on linear MLP networks. On non-linear networks (ResNet56/VGG19, CIFAR datasets), it outperforms the other counterpart solutions by a large margin. Moreover, TPP can also work effectively with modern deep networks (ResNets) on ImageNet, delivering encouraging performance in comparison to many top-performing filter pruning methods. To our best knowledge, this is the first approach that effectively maintains trainability during pruning for the large-scale deep neural networks.

updated: Fri Aug 19 2022 21:13:11 GMT+0000 (UTC)

published: Mon Jul 25 2022 21:15:47 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト