Mixed-Precision Neural Network Quantization via Learned Layer-wise Importance

Chen Tang; Kai Ouyang; Zhi Wang; Yifei Zhu; Yaowei Wang; Wen Ji; Wenwu Zhu

学習したレイヤー単位の重要度による混合精度ニューラルネットワークの量子化

混合精度量子化 (MPQ) の指数関数的に大きな離散検索空間により、各レイヤーの最適なビット幅を決定することが困難になります。以前の研究では通常、トレーニングセットに対して反復的な検索方法を使用しており、数百または数千の GPU 時間を消費していました。この研究では、量子化におけるいくつかの固有の学習可能なパラメーター、つまり量子化器のスケール係数が、特定のビット幅での最終的な精度に対するそのレイヤーの寄与を反映して、レイヤーの重要度の指標として機能できることを明らかにします。これらの重要度指標は、量子化を意識したトレーニング中に数値変換を自然に認識し、レイヤーの量子化感度メトリックを正確に提供できます。ただし、深いネットワークには常に数百のそのような指標が含まれており、それらを 1 つずつトレーニングすると、過度の時間コストがかかります。この問題を克服するために、すべての指標を一度に取得できる共同トレーニングスキームを提案します。元のシーケンシャルトレーニングプロセスを並列化することで、インジケータートレーニングプロセスを大幅に高速化します。これらの学習した重要性指標を使用して、MPQ 検索問題を 1 回限りの整数線形計画法 (ILP) 問題として定式化します。これにより、反復検索が回避され、ビット幅の検索スペースを制限することなく検索時間が大幅に短縮されます。たとえば、インディケータを使用した ResNet18 での MPQ 検索には 0.06 秒しかかからず、反復検索方法と比較して時間効率が指数関数的に向上します。また、広範な実験により、当社のアプローチがさまざまな制約 (BitOps、圧縮率など) を持つ広範囲のモデルに対して ImageNet で SOTA 精度を達成できることが示されています。コードは https://github.com/1hunters/LIMPQ で入手できます。

The exponentially large discrete search space in mixed-precision quantization (MPQ) makes it hard to determine the optimal bit-width for each layer. Previous works usually resort to iterative search methods on the training set, which consume hundreds or even thousands of GPU-hours. In this study, we reveal that some unique learnable parameters in quantization, namely the scale factors in the quantizer, can serve as importance indicators of a layer, reflecting the contribution of that layer to the final accuracy at certain bit-widths. These importance indicators naturally perceive the numerical transformation during quantization-aware training, which can precisely provide quantization sensitivity metrics of layers. However, a deep network always contains hundreds of such indicators, and training them one by one would lead to an excessive time cost. To overcome this issue, we propose a joint training scheme that can obtain all indicators at once. It considerably speeds up the indicators training process by parallelizing the original sequential training processes. With these learned importance indicators, we formulate the MPQ search problem as a one-time integer linear programming (ILP) problem. That avoids the iterative search and significantly reduces search time without limiting the bit-width search space. For example, MPQ search on ResNet18 with our indicators takes only 0.06 s, which improves time efficiency exponentially compared to iterative search methods. Also, extensive experiments show our approach can achieve SOTA accuracy on ImageNet for far-ranging models with various constraints (e.g., BitOps, compress rate). Code is available on https://github.com/1hunters/LIMPQ.

updated: Sun Mar 05 2023 09:08:51 GMT+0000 (UTC)

published: Wed Mar 16 2022 03:23:50 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト