DKM: Differentiable K-Means Clustering Layer for Neural Network Compression

Minsik Cho; Keivan A. Vahid; Saurabh Adya; Mohammad Rastegari

DKM：ニューラルネットワーク圧縮のための微分可能なK-Meansクラスタリングレイヤー

効率的なオンデバイス推論のためのディープニューラルネットワーク（DNN）モデルの圧縮は、メモリ要件を削減し、ユーザーデータをデバイス上に保持するためにますます重要になっています。この目的のために、新しい微分可能k-meansクラスタリング層（DKM）と、トレーニング時の重みクラスタリングベースのDNNモデル圧縮へのその応用を提案します。 DKMは、注意の問題としてk-meansクラスタリングをキャストし、パラメーターとクラスタリング重心の共同最適化を可能にします。追加のレギュラライザーとパラメーターに依存する以前の作業とは異なり、DKMベースの圧縮では、元の損失関数とモデルアーキテクチャが固定されたままになります。コンピュータービジョンおよび自然言語処理（NLP）タスク用のさまざまなDNNモデルでDKMベースの圧縮を評価しました。私たちの結果は、DMKがImageNet1kとGLUEベンチマークで優れた圧縮と精度のトレードオフを提供することを示しています。たとえば、DKMベースの圧縮は、3.3MBモデルサイズ（29.4xモデル圧縮率）のResNet50 DNNモデルで74.5％のトップ1ImageNet1k精度を提供できます。圧縮が難しいDNNであるMobileNet-v1の場合、DKMは0.74 MBのモデルサイズ（22.4xモデル圧縮係数）で62.8％のトップ1ImageNet1k精度を提供します。この結果は、現在の最先端のDNN圧縮アルゴリズムよりも6.8％高いトップ1精度と、33％比較的小さいモデルサイズです。さらに、DKMを使用すると、GLUE NLPベンチマークでの精度の低下を最小限（1.1％）で、DistilBERTモデルを11.8倍に圧縮できます。

Deep neural network (DNN) model compression for efficient on-device inference is becoming increasingly important to reduce memory requirements and keep user data on-device. To this end, we propose a novel differentiable k-means clustering layer (DKM) and its application to train-time weight clustering-based DNN model compression. DKM casts k-means clustering as an attention problem and enables joint optimization of the parameters and clustering centroids. Unlike prior works that rely on additional regularizers and parameters, DKM-based compression keeps the original loss function and model architecture fixed. We evaluated DKM-based compression on various DNN models for computer vision and natural language processing (NLP) tasks. Our results demonstrate that DMK delivers superior compression and accuracy trade-off on ImageNet1k and GLUE benchmarks. For example, DKM-based compression can offer 74.5% top-1 ImageNet1k accuracy on ResNet50 DNN model with 3.3MB model size (29.4x model compression factor). For MobileNet-v1, which is a challenging DNN to compress, DKM delivers 62.8% top-1 ImageNet1k accuracy with 0.74 MB model size (22.4x model compression factor). This result is 6.8% higher top-1 accuracy and 33% relatively smaller model size than the current state-of-the-art DNN compression algorithms. Additionally, DKM enables compression of DistilBERT model by 11.8x with minimal (1.1%) accuracy loss on GLUE NLP benchmarks.

updated: Sat Aug 28 2021 14:35:41 GMT+0000 (UTC)

published: Sat Aug 28 2021 14:35:41 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト