Data-free mixed-precision quantization using novel sensitivity metric

Donghyun Lee; Minkyoung Cho; Seungwon Lee; Joonho Song; Changkyu Choi

新しい感度メトリックを使用したデータフリーの混合精度量子化

トレーニング後の量子化は、ニューラルネットワークを圧縮するための代表的な手法であり、エッジデバイスへの展開のためにニューラルネットワークをより小さく、より効率的にします。ただし、アクセスできないユーザーデータセットでは、実際に量子化ニューラルネットワークの品質を保証することが困難になることがよくあります。さらに、既存のアプローチでは、ネットワーク全体で単一の均一なビット幅を使用する可能性があり、その結果、非常に低いビット幅で精度が大幅に低下します。複数のビット幅を利用するには、感度メトリックが精度と圧縮のバランスをとる上で重要な役割を果たします。この論文では、タスクの損失と他の層との相互作用に対する量子化誤差の影響を考慮した新しい感度メトリックを提案します。さらに、ニューラルネットワークの特定の操作に依存しないラベル付きデータ生成方法を開発します。私たちの実験は、提案されたメトリックが量子化感度をよりよく表し、生成されたデータが混合精度量子化に適用される可能性が高いことを示しています。

Post-training quantization is a representative technique for compressing neural networks, making them smaller and more efficient for deployment on edge devices. However, an inaccessible user dataset often makes it difficult to ensure the quality of the quantized neural network in practice. In addition, existing approaches may use a single uniform bit-width across the network, resulting in significant accuracy degradation at extremely low bit-widths. To utilize multiple bit-width, sensitivity metric plays a key role in balancing accuracy and compression. In this paper, we propose a novel sensitivity metric that considers the effect of quantization error on task loss and interaction with other layers. Moreover, we develop labeled data generation methods that are not dependent on a specific operation of the neural network. Our experiments show that the proposed metric better represents quantization sensitivity, and generated data are more feasible to be applied to mixed-precision quantization.

updated: Thu Mar 18 2021 07:23:21 GMT+0000 (UTC)

published: Thu Mar 18 2021 07:23:21 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト