PD-Quant: Post-Training Quantization based on Prediction Difference Metric

Jiawei Liu; Lin Niu; Zhihang Yuan; Dawei Yang; Xinggang Wang; Wenyu Liu

PD-Quant: 予測差メトリックに基づくトレーニング後の量子化

トレーニング後の量子化 (PTQ) は、完全精度のモデルを低精度のデータ型を使用して量子化されたモデルに変換するニューラルネットワークの圧縮手法です。ディープニューラルネットワークのサイズと計算コストを削減するのに役立ちますが、量子化ノイズが発生し、特に非常に低いビット設定で予測精度が低下する可能性があります。適切な量子化パラメータ (スケーリング係数や重みの丸めなど) をどのように決定するかが、現在直面している主な問題です。既存の方法は、量子化の前後に特徴間の距離を最小化することによってこれらのパラメーターを決定しようとしますが、そのようなアプローチは局所的な情報のみを考慮し、最適な量子化パラメーターが得られない場合があります。この問題を分析し、グローバルな情報を考慮してこの制限に対処する方法である PD-Quant を提案します。量子化前後のネットワーク予測の差分情報を用いて量子化パラメータを決定します。さらに、PD-Quant は、アクティベーションの分布を調整することにより、少数のキャリブレーションセットによって引き起こされる PTQ のオーバーフィッティングの問題を軽減できます。実験では、特に低ビット設定で、PD-Quant がより良い量子化パラメーターをもたらし、量子化モデルの予測精度を向上させることが示されています。たとえば、PD-Quant は、ResNet-18 の精度を最大 53.14%、RegNetX-600MF の精度を重み 2 ビットアクティベーション 2 ビットで最大 40.67% 押し上げます。コードは https://github.com/hustvl/PD-Quant で公開されています。

Post-training quantization (PTQ) is a neural network compression technique that converts a full-precision model into a quantized model using lower-precision data types. Although it can help reduce the size and computational cost of deep neural networks, it can also introduce quantization noise and reduce prediction accuracy, especially in extremely low-bit settings. How to determine the appropriate quantization parameters (e.g., scaling factors and rounding of weights) is the main problem facing now. Existing methods attempt to determine these parameters by minimize the distance between features before and after quantization, but such an approach only considers local information and may not result in the most optimal quantization parameters. We analyze this issue and ropose PD-Quant, a method that addresses this limitation by considering global information. It determines the quantization parameters by using the information of differences between network prediction before and after quantization. In addition, PD-Quant can alleviate the overfitting problem in PTQ caused by the small number of calibration sets by adjusting the distribution of activations. Experiments show that PD-Quant leads to better quantization parameters and improves the prediction accuracy of quantized models, especially in low-bit settings. For example, PD-Quant pushes the accuracy of ResNet-18 up to 53.14% and RegNetX-600MF up to 40.67% in weight 2-bit activation 2-bit. The code is released at https://github.com/hustvl/PD-Quant.

updated: Mon Mar 27 2023 05:47:22 GMT+0000 (UTC)

published: Wed Dec 14 2022 05:48:58 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト