Deep Model Compression based on the Training History

S. H. Shabbeer Basha; Mohammad Farazuddin; Viswanath Pulabaigari; Shiv Ram Dubey; Snehasis Mukherjee

トレーニング履歴に基づくディープモデル圧縮

Deep Convolutional Neural Networks（DCNN）は、いくつかの視覚認識問題で有望なパフォーマンスを示しており、研究者はLeNet、AlexNet、VGGNet、ResNetなどの人気のあるアーキテクチャを提案するようになりました。これらのアーキテクチャには、計算の複雑さとパラメータの保存が高くつくという犠牲が伴います。ストレージと計算の複雑さを取り除くために、ディープモデル圧縮方式が進化しました。フィルター剪定にネットワークトレーニング履歴を利用する「履歴ベースフィルター剪定（HBFP）」手法を提案します。具体的には、トレーニングエポックにわたってフィルターのL1ノルム（重みの絶対合計）で同様のパターンを観察することにより、冗長フィルターを整理します。 CNNの冗長フィルターを3つのステップで繰り返し整理します。まず、モデルをトレーニングし、各ペアに冗長フィルターがあるフィルターペアを選択します。次に、ネットワークを最適化して、ペアのフィルター間の類似性の測定値を確実に高めます。このネットワークの最適化により、情報をあまり失うことなく、重要性に基づいて各ペアから1つのフィルターを削除することが容易になります。最後に、ネットワークを再トレーニングしてパフォーマンスを回復します。パフォーマンスは、フィルターのプルーニングによって低下します。 MNISTデータセット上のLeNet-5などの一般的なアーキテクチャでアプローチをテストします。 CIFAR-10データセットのVGG-16、ResNet-56、ResNet-110、およびImageNetのResNet-50。提案されたプルーニング方法は、LeNet-5、VGG-16、ResNetのFLOP削減（浮動小数点演算）の点で、97.98％、83.42％、78.43％、74.95％、および75.45％の点で最先端を上回っています。 -56、ResNet-110、およびResNet-50であり、エラー率は低く抑えられています。

Deep Convolutional Neural Networks (DCNNs) have shown promising performances in several visual recognition problems which motivated the researchers to propose popular architectures such as LeNet, AlexNet, VGGNet, ResNet, and many more. These architectures come at a cost of high computational complexity and parameter storage. To get rid of storage and computational complexity, deep model compression methods have been evolved. We propose a "History Based Filter Pruning (HBFP)" method that utilizes network training history for filter pruning. Specifically, we prune the redundant filters by observing similar patterns in the filter's L1-norms (absolute sum of weights) over the training epochs. We iteratively prune the redundant filters of a CNN in three steps. First, we train the model and select the filter pairs with redundant filters in each pair. Next, we optimize the network to ensure an increased measure of similarity between the filters in a pair. This optimization of the network facilitates us to prune one filter from each pair based on its importance without much information loss. Finally, we retrain the network to regain the performance, which is dropped due to filter pruning. We test our approach on popular architectures such as LeNet-5 on MNIST dataset; VGG-16, ResNet-56, and ResNet-110 on CIFAR-10 dataset, and ResNet-50 on ImageNet. The proposed pruning method outperforms the state-of-the-art in terms of FLOPs reduction (floating-point operations) by 97.98%, 83.42%, 78.43%, 74.95%, and 75.45% for LeNet-5, VGG-16, ResNet-56, ResNet-110, and ResNet-50, respectively, while maintaining the less error rate.

updated: Thu May 12 2022 08:59:31 GMT+0000 (UTC)

published: Sat Jan 30 2021 06:04:21 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト