Loss Aware Post-training Quantization

Yury Nahshan; Brian Chmiel; Chaim Baskin; Evgenii Zheltonozhskii; Ron Banner; Alex M. Bronstein; Avi Mendelson

損失を意識したトレーニング後の量子化

ニューラルネットワークの量子化により、リソースに制約のあるデバイスに大きなモデルを展開できます。現在のトレーニング後の量子化方法は、INT4（またはそれ以下）の精度の点では不十分ですが、INT8（またはそれ以上）の妥当な精度を提供します。この作業では、量子化が損失ランドスケープの構造に与える影響を調べます。さらに、構造が平坦で穏やかな量子化に対して分離可能であることを示し、簡単なトレーニング後の量子化方法で良好な結果を達成できるようにします。より積極的な量子化では、損失のランドスケープが急峻な曲率で非常に分離不能になり、量子化パラメータの選択がより困難になることを示します。この理解を武器に、レイヤーパラメーターを一緒に量子化する方法を設計し、現在のトレーニング後の量子化方法よりも大幅に精度を向上できるようにします。リファレンス実装は、https：//github.com/ynahshan/nn-quantization-pytorch/tree/master/lapqで入手できます。

Neural network quantization enables the deployment of large models on resource-constrained devices. Current post-training quantization methods fall short in terms of accuracy for INT4 (or lower) but provide reasonable accuracy for INT8 (or above). In this work, we study the effect of quantization on the structure of the loss landscape. Additionally, we show that the structure is flat and separable for mild quantization, enabling straightforward post-training quantization methods to achieve good results. We show that with more aggressive quantization, the loss landscape becomes highly non-separable with steep curvature, making the selection of quantization parameters more challenging. Armed with this understanding, we design a method that quantizes the layer parameters jointly, enabling significant accuracy improvement over current post-training quantization methods. Reference implementation is available at https://github.com/ynahshan/nn-quantization-pytorch/tree/master/lapq

updated: Mon Mar 16 2020 09:23:27 GMT+0000 (UTC)

published: Sun Nov 17 2019 09:10:23 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト