UWC: Unit-wise Calibration Towards Rapid Network Compression

Chen Lin; Zheyang Li; Bo Peng; Haoji Hu; Wenming Tan; Ye Ren; Shiliang Pu

UWC：迅速なネットワーク圧縮に向けたユニット単位のキャリブレーション

この論文では、高性能で高効率の畳み込みニューラルネットワーク〜（CNN）量子化を実現するトレーニング後量子化〜（PTQ）法を紹介します。以前のPTQメソッドは通常、レイヤーごとのパラメーターキャリブレーションを実行することで圧縮エラーを減らします。ただし、極端に圧縮されたパラメータの表現能力が低い場合（たとえば、ビット幅が4未満になる場合）、すべてのレイヤーごとのエラーを排除することは困難です。この作業は、ユニット単位の誤差の2次テイラー級数展開の観測に基づく単位単位の特徴再構成アルゴリズムを提案することによってこの問題に対処します。これは、隣接するレイヤーのパラメーター間の相互作用を活用することで、レイヤーごとのエラーをより適切に補正できることを示しています。この論文では、いくつかの隣接する層を基本ユニットとして定義し、量子化誤差を最小限に抑えることができるユニットごとのトレーニング後のアルゴリズムを提示します。この方法は、FP32モデルをINT4およびINT3に量子化するときに、ImageNetおよびCOCOでほぼ元の精度を実現します。

This paper introduces a post-training quantization~(PTQ) method achieving highly efficient Convolutional Neural Network~ (CNN) quantization with high performance. Previous PTQ methods usually reduce compression error via performing layer-by-layer parameters calibration. However, with lower representational ability of extremely compressed parameters (e.g., the bit-width goes less than 4), it is hard to eliminate all the layer-wise errors. This work addresses this issue via proposing a unit-wise feature reconstruction algorithm based on an observation of second order Taylor series expansion of the unit-wise error. It indicates that leveraging the interaction between adjacent layers' parameters could compensate layer-wise errors better. In this paper, we define several adjacent layers as a Basic-Unit, and present a unit-wise post-training algorithm which can minimize quantization error. This method achieves near-original accuracy on ImageNet and COCO when quantizing FP32 models to INT4 and INT3.

updated: Mon Jan 17 2022 12:27:35 GMT+0000 (UTC)

published: Mon Jan 17 2022 12:27:35 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト