Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation

Zechun Liu; Kwang-Ting Cheng; Dong Huang; Eric Xing; Zhiqiang Shen

不均一から均一への量子化：一般化されたストレートスルー推定による正確な量子化に向けて

ニューラルネットワークを圧縮するための不均一な量子化戦略は、通常、その優れた表現能力のために、対応するもの、つまり均一な戦略よりも優れたパフォーマンスを実現します。ただし、多くの不均一な量子化方法は、不均一に量子化された重み/アクティブ化を実装する際の複雑な投影プロセスを見落とします。これにより、ハードウェアの展開で無視できない時間とスペースのオーバーヘッドが発生します。本研究では、モデル推論の均一量子化としてハードウェアに優しく効率的でありながら、不均一法の強力な表現能力を維持できる方法である不均一から均一量子化（N2UQ）を提案します。これは、これらの実数値の入力を等距離の出力レベルに量子化しながら、基礎となる分布によりよく適合するように、柔軟な等距離内の入力しきい値を学習することで実現します。学習可能な入力しきい値を使用して量子化ネットワークをトレーニングするために、しきい値パラメーターを使用した扱いにくい後退微分計算用の一般化されたストレートスルー推定量（G-STE）を導入します。さらに、重み量子化における情報損失をさらに減らすために、エントロピー保存正則化を検討します。均一に量子化された重みとアクティベーションを課すというこの不利な制約の下でも、N2UQはImageNetで最先端の不均一な量子化方法を0.7〜1.8％上回り、N2UQ設計の貢献を示しています。コードは公開されます。

The nonuniform quantization strategy for compressing neural networks usually achieves better performance than its counterpart, i.e., uniform strategy, due to its superior representational capacity. However, many nonuniform quantization methods overlook the complicated projection process in implementing the nonuniformly quantized weights/activations, which incurs non-negligible time and space overhead in hardware deployment. In this study, we propose Nonuniform-to-Uniform Quantization (N2UQ), a method that can maintain the strong representation ability of nonuniform methods while being hardware-friendly and efficient as the uniform quantization for model inference. We achieve this through learning the flexible in-equidistant input thresholds to better fit the underlying distribution while quantizing these real-valued inputs into equidistant output levels. To train the quantized network with learnable input thresholds, we introduce a generalized straight-through estimator (G-STE) for intractable backward derivative calculation w.r.t. threshold parameters. Additionally, we consider entropy preserving regularization to further reduce information loss in weight quantization. Even under this adverse constraint of imposing uniformly quantized weights and activations, our N2UQ outperforms state-of-the-art nonuniform quantization methods by 0.7~1.8% on ImageNet, demonstrating the contribution of N2UQ design. Code will be made publicly available.

updated: Mon Nov 29 2021 18:59:55 GMT+0000 (UTC)

published: Mon Nov 29 2021 18:59:55 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト