Transform Quantization for CNN (Convolutional Neural Network) Compression

Sean I. Young; Wang Zhe; David Taubman; Bernd Girod

CNN（畳み込みニューラルネットワーク）圧縮の変換量子化

この論文では、変換量子化を介してトレーニング後の畳み込みニューラルネットワーク（CNN）の重みを圧縮します。以前のCNN量子化手法は、重みとアクティベーションの共同統計を無視して、特定の量子化ビットレートで次善のCNNパフォーマンスを生成するか、トレーニング中にのみ共同統計を考慮し、すでにトレーニングされたCNNモデルの効率的な圧縮を促進しない傾向があります。レート歪みフレームワークを使用してトレーニング後の重みを最適に変換（相関解除）および量子化して、任意の量子化ビットレートでの圧縮を改善します。変換量子化は、量子化と次元削減（非相関）技術を単一のフレームワークに統合して、CNNの低ビットレート圧縮と変換ドメインでの効率的な推論を容易にします。まず、CNN量子化のレートと歪みの理論を紹介し、レート歪み最適化問題として最適な量子化を提起します。次に、このペーパーで導出した最適なエンドツーエンド学習変換（ELT）による無相関化に続く最適なビット深度割り当てを使用して、この問題を解決できることを示します。実験は、変換量子化が、再トレーニングされた量子化シナリオと再トレーニングされていない量子化シナリオの両方で、CNN圧縮の最先端を進歩させることを示しています。特に、再トレーニングを使用した変換量子化は、AlexNet、ResNet、DenseNetなどのCNNモデルを非常に低いビットレート（1〜2ビット）に圧縮できることがわかりました。

In this paper, we compress convolutional neural network (CNN) weights post-training via transform quantization. Previous CNN quantization techniques tend to ignore the joint statistics of weights and activations, producing sub-optimal CNN performance at a given quantization bit-rate, or consider their joint statistics during training only and do not facilitate efficient compression of already trained CNN models. We optimally transform (decorrelate) and quantize the weights post-training using a rate-distortion framework to improve compression at any given quantization bit-rate. Transform quantization unifies quantization and dimensionality reduction (decorrelation) techniques in a single framework to facilitate low bit-rate compression of CNNs and efficient inference in the transform domain. We first introduce a theory of rate and distortion for CNN quantization, and pose optimum quantization as a rate-distortion optimization problem. We then show that this problem can be solved using optimal bit-depth allocation following decorrelation by the optimal End-to-end Learned Transform (ELT) we derive in this paper. Experiments demonstrate that transform quantization advances the state of the art in CNN compression in both retrained and non-retrained quantization scenarios. In particular, we find that transform quantization with retraining is able to compress CNN models such as AlexNet, ResNet and DenseNet to very low bit-rates (1-2 bits).

updated: Sun Nov 07 2021 17:29:26 GMT+0000 (UTC)

published: Wed Sep 02 2020 16:33:42 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト