A Generalized Zero-Shot Quantization of Deep Convolutional Neural Networks via Learned Weights Statistics

Prasen Kumar Sharma; Arun Abraham; Vikram Nelvoy Rajendiran

学習された重み統計による深い畳み込みニューラルネットワークの一般化されたゼロショット量子化

浮動小数点の重みと深い畳み込みニューラルネットワークのアクティブ化を固定小数点表現に量子化すると、メモリフットプリントと推論時間が短縮されます。最近、特定のタスクの元のラベルなしトレーニングサンプルを必要としないゼロショット量子化に向けた取り組みが進んでいます。これらの最もよく公開されている作品は、量子化のアクティベーションの範囲を推測するために、学習したバッチ正規化（BN）パラメーターに大きく依存しています。特に、これらの方法は、活性化の範囲を計算するために、経験的推定フレームワークまたはデータ蒸留アプローチのいずれかに基づいて構築されています。ただし、このようなスキームのパフォーマンスは、BNレイヤーに対応していないネットワークを使用すると大幅に低下します。この考え方では、元のデータを必要とせず、BN層の統計に依存しない一般化されたゼロショット量子化（GZSQ）フレームワークを提案します。データ蒸留アプローチを利用し、モデルの事前にトレーニングされた重みのみを利用して、アクティベーションの範囲キャリブレーション用の強化されたデータを推定しました。私たちの知る限り、これは事前にトレーニングされた重みの分布を利用してゼロショット量子化のプロセスを支援する最初の作業です。提案されたスキームは、既存のゼロショット作業を大幅に上回りました。たとえば、さまざまなタスクで、MobileNetV2およびBNレイヤーのない他のいくつかのモデルの分類精度が約33％向上しました。また、複数のオープンソース量子化フレームワークにわたって提案された作業の有効性を示しました。重要なことに、私たちの仕事は、未来的な非正規化ディープニューラルネットワークのトレーニング後のゼロショット量子化に向けた最初の試みです。

Quantizing the floating-point weights and activations of deep convolutional neural networks to fixed-point representation yields reduced memory footprints and inference time. Recently, efforts have been afoot towards zero-shot quantization that does not require original unlabelled training samples of a given task. These best-published works heavily rely on the learned batch normalization (BN) parameters to infer the range of the activations for quantization. In particular, these methods are built upon either empirical estimation framework or the data distillation approach, for computing the range of the activations. However, the performance of such schemes severely degrades when presented with a network that does not accommodate BN layers. In this line of thought, we propose a generalized zero-shot quantization (GZSQ) framework that neither requires original data nor relies on BN layer statistics. We have utilized the data distillation approach and leveraged only the pre-trained weights of the model to estimate enriched data for range calibration of the activations. To the best of our knowledge, this is the first work that utilizes the distribution of the pretrained weights to assist the process of zero-shot quantization. The proposed scheme has significantly outperformed the existing zero-shot works, e.g., an improvement of ~ 33% in classification accuracy for MobileNetV2 and several other models that are w & w/o BN layers, for a variety of tasks. We have also demonstrated the efficacy of the proposed work across multiple open-source quantization frameworks. Importantly, our work is the first attempt towards the post-training zero-shot quantization of futuristic unnormalized deep neural networks.

updated: Mon Dec 06 2021 07:41:16 GMT+0000 (UTC)

published: Mon Dec 06 2021 07:41:16 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト