An Inter-Layer Weight Prediction and Quantization for Deep Neural Networks based on a Smoothly Varying Weight Hypothesis

Kang-Ho Lee; JoonHyun Jeong; Sung-Ho Bae

滑らかに変化する重み仮説に基づくディープニューラルネットワークの層間重み予測と量子化

リソースに制約のある環境のため、ネットワーク圧縮は、ディープニューラルネットワーク研究の重要な部分になっています。この論文では、従来のビデオ符号化方式におけるフレーム間予測法に基づいて、すべての畳み込み層の重み間の予測残差を量子化する新しい圧縮法である層間重み予測（ILWP）および量子化法を提案します。さらに、隣接する畳み込み層の重みが形状と値の強い類似性を共有する、つまり、重みが層に沿って滑らかに変化する傾向があるという現象である、滑らかに変化する重み仮説（SVWH）を発見しました。 SVWHに基づいて、隣接する畳み込み層の重み間の予測残差を量子化する2番目のILWPおよび量子化方法を提案します。予測された重みの残差は非常に低い分散でラプラス分布に従う傾向があるため、重みの量子化をより効果的に適用でき、より多くのゼロの重みを生成し、重みの圧縮率を向上させます。さらに、非テクスチャビットを排除するための新しいレイヤー間損失を提案し、テクスチャビットのみをより効率的に格納できるようにしました。つまり、提案された損失は、隣接する2つのレイヤー間で併置された重みが同じ値になるように重みを正規化します。最後に、層間損失と量子化法を備えたILWPを提案します。私たちの包括的な実験は、提案された方法が、ディープニューラルネットワークにおける以前の量子化ベースの圧縮方法と比較して、同じ精度レベルではるかに高い重み圧縮率を達成することを示しています。

Due to a resource-constrained environment, network compression has become an important part of deep neural networks research. In this paper, we propose a new compression method, Inter-Layer Weight Prediction (ILWP) and quantization method which quantize the predicted residuals between the weights in all convolution layers based on an inter-frame prediction method in conventional video coding schemes. Furthermore, we found a phenomenon Smoothly Varying Weight Hypothesis (SVWH) which is that the weights in adjacent convolution layers share strong similarity in shapes and values, i.e., the weights tend to vary smoothly along with the layers. Based on SVWH, we propose a second ILWP and quantization method which quantize the predicted residuals between the weights in adjacent convolution layers. Since the predicted weight residuals tend to follow Laplace distributions with very low variance, the weight quantization can more effectively be applied, thus producing more zero weights and enhancing the weight compression ratio. In addition, we propose a new inter-layer loss for eliminating non-texture bits, which enabled us to more effectively store only texture bits. That is, the proposed loss regularizes the weights such that the collocated weights between the adjacent two layers have the same values. Finally, we propose an ILWP with an inter-layer loss and quantization method. Our comprehensive experiments show that the proposed method achieves a much higher weight compression rate at the same accuracy level compared with the previous quantization-based compression methods in deep neural networks.

updated: Thu Aug 20 2020 02:32:12 GMT+0000 (UTC)

published: Tue Jul 16 2019 04:44:59 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト