A White Paper on Neural Network Quantization

Markus Nagel; Marios Fournarakis; Rana Ali Amjad; Yelysei Bondarenko; Mart van Baalen; Tijmen Blankevoort

ニューラルネットワークの量子化に関するホワイトペーパー

ニューラルネットワークは多くのアプリケーションでフロンティアを進歩させてきましたが、多くの場合、計算コストが高くなります。最新のネットワークを、電力と計算の要件が厳しいエッジデバイスに統合する場合は、ニューラルネットワーク推論の電力と遅延を削減することが重要です。ニューラルネットワークの量子化は、これらの節約を達成するための最も効果的な方法の1つですが、それが引き起こす追加のノイズは、精度の低下につながる可能性があります。このホワイトペーパーでは、低ビットの重みとアクティブ化を維持しながら、ネットワークのパフォーマンスに対する量子化ノイズの影響を軽減するための最先端のアルゴリズムを紹介します。ハードウェアに動機付けられた量子化の概要から始めて、次に2つの主要なクラスのアルゴリズムを検討します。トレーニング後の量子化（PTQ）と量子化認識トレーニング（QAT）です。 PTQは、再トレーニングやラベル付けされたデータを必要としないため、量子化への軽量のプッシュボタンアプローチです。ほとんどの場合、浮動小数点精度に近い8ビット量子化を実現するにはPTQで十分です。 QATは、ラベル付けされたトレーニングデータへの微調整とアクセスを必要としますが、競争力のある結果でより低いビット量子化を可能にします。どちらのソリューションでも、既存の文献と広範な実験に基づいてテスト済みのパイプラインを提供し、一般的な深層学習モデルとタスクの最先端のパフォーマンスを実現します。

While neural networks have advanced the frontiers in many applications, they often come at a high computational cost. Reducing the power and latency of neural network inference is key if we want to integrate modern networks into edge devices with strict power and compute requirements. Neural network quantization is one of the most effective ways of achieving these savings but the additional noise it induces can lead to accuracy degradation. In this white paper, we introduce state-of-the-art algorithms for mitigating the impact of quantization noise on the network's performance while maintaining low-bit weights and activations. We start with a hardware motivated introduction to quantization and then consider two main classes of algorithms: Post-Training Quantization (PTQ) and Quantization-Aware-Training (QAT). PTQ requires no re-training or labelled data and is thus a lightweight push-button approach to quantization. In most cases, PTQ is sufficient for achieving 8-bit quantization with close to floating-point accuracy. QAT requires fine-tuning and access to labeled training data but enables lower bit quantization with competitive results. For both solutions, we provide tested pipelines based on existing literature and extensive experimentation that lead to state-of-the-art performance for common deep learning models and tasks.

updated: Tue Jun 15 2021 17:12:42 GMT+0000 (UTC)

published: Tue Jun 15 2021 17:12:42 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト