Learning Low-Rank Representations for Model Compression

Zezhou Zhu; Yucong Zhou; Zhao Zhong

モデル圧縮のための低ランク表現の学習

ベクトル量子化 (VQ) は、精度の低下が少ない小さなモデルを取得するための魅力的なモデル圧縮方法です。固定されたクラスタリングの次元でより良いコードブックとコードを取得する方法が広く研究されていますが、クラスタリングのパフォーマンスを優先するベクトルの最適化は、特にベクトルの次元の削減によって慎重に検討されていません。この論文では、次元圧縮とベクトル量子化の組み合わせに関する最近の進歩を報告し、さまざまなタスクとアーキテクチャで以前の VQ アルゴリズムよりも優れた低ランク表現ベクトル量子化 (LR^2VQ) メソッドを提案します。 LR^2VQ は、低ランク表現をサブベクトルクラスタリングと結合して、タスクロスに対するエンドツーエンドのトレーニングを通じて直接最適化される新しい種類のビルディングブロックを構築します。提案された設計パターンは、3 つのハイパーパラメーター、クラスターの数 k、サブベクトルのサイズ m、およびクラスター化の次元 d を導入します。この方法では、圧縮率は m によって直接制御でき、最終的な精度は d によってのみ決定されます。低ランク近似誤差とクラスタリング誤差の間のトレードオフとして d を認識し、微調整の前に適切な d の推定を可能にする理論的分析と実験的観測の両方を実行します。適切な d を使用して、ImageNet 分類データセットで ResNet-18/ResNet-50 を使用して LR^2VQ を評価し、現在の最先端の VQ ベースの圧縮アルゴリズムよりも 2.8%/1.0% のトップ 1 精度の向上を達成しました。 43×/31×圧縮率。

Vector Quantization (VQ) is an appealing model compression method to obtain a tiny model with less accuracy loss. While methods to obtain better codebooks and codes under fixed clustering dimensionality have been extensively studied, optimizations of the vectors in favour of clustering performance are not carefully considered, especially via the reduction of vector dimensionality. This paper reports our recent progress on the combination of dimensionality compression and vector quantization, proposing a Low-Rank Representation Vector Quantization (LR^2VQ) method that outperforms previous VQ algorithms in various tasks and architectures. LR^2VQ joins low-rank representation with subvector clustering to construct a new kind of building block that is directly optimized through end-to-end training over the task loss. Our proposed design pattern introduces three hyper-parameters, the number of clusters k, the size of subvectors m and the clustering dimensionality d. In our method, the compression ratio could be directly controlled by m, and the final accuracy is solely determined by d. We recognize d as a trade-off between low-rank approximation error and clustering error and carry out both theoretical analysis and experimental observations that empower the estimation of the proper d before fine-tunning. With a proper d, we evaluate LR^2VQ with ResNet-18/ResNet-50 on ImageNet classification datasets, achieving 2.8%/1.0% top-1 accuracy improvements over the current state-of-the-art VQ-based compression algorithms with 43×/31× compression factor.

updated: Mon Nov 21 2022 12:15:28 GMT+0000 (UTC)

published: Mon Nov 21 2022 12:15:28 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト