Elastic Significant Bit Quantization and Acceleration for Deep Neural Networks

Cheng Gong; Ye Lu; Kunpeng Xie; Zongming Jin; Tao Li; Yanzhi Wang

ディープニューラルネットワークのための弾性の重要なビット量子化と加速

量子化は、深部神経ネットワーク（DNN）の推論効率を改善するための重要な方法であることが証明されています。ただし、DNNの重みまたはアクティベーション値を高精度の形式から量子化された形式に量子化する際に、精度と効率のバランスをとることは依然として困難です。量子化された値の有意ビット数を制御して、より少ないリソースでより良い推論精度を得る、弾性有意ビット量子化（ESB）と呼ばれる新しい方法を提案します。 ESBの量子化された値を柔軟な有効ビット数で制約するために、統一された数式を設計します。また、分布差アライナー（DDA）を導入して、完全精度の重みまたはアクティベーション値と量子化された値の間の分布を定量的に調整します。その結果、ESBは、さまざまなベル型の重みの分布とDNNのアクティブ化に適しており、高い推論精度を維持します。 ESBは、量子化された値の重要なビットが少ないことから恩恵を受け、乗算の複雑さを軽減できます。 ESBをアクセラレータとして実装し、FPGAでの効率を定量的に評価します。広範な実験結果は、ESB量子化が常に最先端の方法を上回り、AlexNet、ResNet18、およびMobileNetV2よりもそれぞれ4.78％、1.92％、および3.56％の平均精度の向上を達成していることを示しています。さらに、アクセラレータとしてのESBは、ザイリンクスZCU102FPGAプラットフォームでDSPなしで1kLUTの10.95GOPSピークパフォーマンスを達成できます。 FPGAのCPU、GPU、および最先端のアクセラレータと比較して、ESBアクセラレータは、エネルギー効率をそれぞれ最大65倍、11倍、および26倍向上させることができます。

Quantization has been proven to be a vital method for improving the inference efficiency of deep neural networks (DNNs). However, it is still challenging to strike a good balance between accuracy and efficiency while quantizing DNN weights or activation values from high-precision formats to their quantized counterparts. We propose a new method called elastic significant bit quantization (ESB) that controls the number of significant bits of quantized values to obtain better inference accuracy with fewer resources. We design a unified mathematical formula to constrain the quantized values of the ESB with a flexible number of significant bits. We also introduce a distribution difference aligner (DDA) to quantitatively align the distributions between the full-precision weight or activation values and quantized values. Consequently, ESB is suitable for various bell-shaped distributions of weights and activation of DNNs, thus maintaining a high inference accuracy. Benefitting from fewer significant bits of quantized values, ESB can reduce the multiplication complexity. We implement ESB as an accelerator and quantitatively evaluate its efficiency on FPGAs. Extensive experimental results illustrate that ESB quantization consistently outperforms state-of-the-art methods and achieves average accuracy improvements of 4.78%, 1.92%, and 3.56% over AlexNet, ResNet18, and MobileNetV2, respectively. Furthermore, ESB as an accelerator can achieve 10.95 GOPS peak performance of 1k LUTs without DSPs on the Xilinx ZCU102 FPGA platform. Compared with CPU, GPU, and state-of-the-art accelerators on FPGAs, the ESB accelerator can improve the energy efficiency by up to 65x, 11x, and 26x, respectively.

updated: Wed Sep 08 2021 09:13:59 GMT+0000 (UTC)

published: Wed Sep 08 2021 09:13:59 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト