PD-Quant: Post-Training Quantization based on Prediction Difference Metric

Jiawei Liu; Lin Niu; Zhihang Yuan; Dawei Yang; Xinggang Wang; Wenyu Liu

PD-Quant: 予測差メトリックに基づくトレーニング後の量子化

ニューラルネットワークの圧縮手法であるトレーニング後の量子化 (PTQ) は、事前トレーニング済みのモデルを、精度の低いデータ型を使用して量子化されたモデルに変換します。ただし、特に非常に低いビット設定では、量子化ノイズのために予測精度が低下します。適切な量子化パラメータ (スケーリング係数や重みの丸めなど) をどのように決定するかが、現在直面している主な問題です。多くの既存の方法は、量子化前後の特徴間の距離を最小化することによって量子化パラメーターを決定します。この距離をメトリックとして使用して量子化パラメーターを最適化すると、ローカル情報のみが考慮されます。ローカルメトリックを最小化する問題を分析し、最適な量子化パラメーターが得られないことを示します。さらに、量子化されたモデルは、PTQ のキャリブレーションサンプルの数が少ないため、オーバーフィッティングの影響を受けます。本稿では、これらの問題を解決するために PD-Quant を提案します。 PD-Quant は、量子化前後のネットワーク予測の差の情報を使用して、量子化パラメーターを決定します。オーバーフィッティングの問題を軽減するために、PD-Quant は PTQ のアクティベーションの分布を調整します。実験では、特に低ビット設定で、PD-Quant がより良い量子化パラメーターをもたらし、量子化モデルの予測精度を向上させることが示されています。たとえば、PD-Quant は、ResNet-18 の精度を最大 53.08%、RegNetX-600MF の精度を重み 2 ビットアクティベーション 2 ビットで最大 40.92% 押し上げます。コードは https://github.com/hustvl/PD-Quant で公開されます。

As a neural network compression technique, post-training quantization (PTQ) transforms a pre-trained model into a quantized model using a lower-precision data type. However, the prediction accuracy will decrease because of the quantization noise, especially in extremely low-bit settings. How to determine the appropriate quantization parameters (e.g., scaling factors and rounding of weights) is the main problem facing now. Many existing methods determine the quantization parameters by minimizing the distance between features before and after quantization. Using this distance as the metric to optimize the quantization parameters only considers local information. We analyze the problem of minimizing local metrics and indicate that it would not result in optimal quantization parameters. Furthermore, the quantized model suffers from overfitting due to the small number of calibration samples in PTQ. In this paper, we propose PD-Quant to solve the problems. PD-Quant uses the information of differences between network prediction before and after quantization to determine the quantization parameters. To mitigate the overfitting problem, PD-Quant adjusts the distribution of activations in PTQ. Experiments show that PD-Quant leads to better quantization parameters and improves the prediction accuracy of quantized models, especially in low-bit settings. For example, PD-Quant pushes the accuracy of ResNet-18 up to 53.08% and RegNetX-600MF up to 40.92% in weight 2-bit activation 2-bit. The code will be released at https://github.com/hustvl/PD-Quant.

updated: Fri Feb 24 2023 01:56:07 GMT+0000 (UTC)

published: Wed Dec 14 2022 05:48:58 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト