Sharpness-aware Quantization for Deep Neural Networks

Jing Liu; Jianfei Cai; Bohan Zhuang

深層ニューラルネットワークのシャープネス認識量子化

ネットワーク量子化は、モデル圧縮の主要なパラダイムです。ただし、トレーニング中の量子化された重みの急激な変化は、多くの場合、深刻な損失変動につながり、急激な損失状況をもたらし、勾配を不安定にし、パフォーマンスを低下させます。最近、Sharpness-Aware Minimization (SAM) が提案され、損失状況を平滑化し、モデルの一般化パフォーマンスを向上させました。それにもかかわらず、SAM を量子化されたモデルに直接適用すると、摂動の不一致や減少の問題が発生し、最適なパフォーマンスが得られない可能性があります。この論文では、シャープネス認識量子化 (SAQ) と呼ばれる新しい方法を提案し、モデル圧縮、特に初めての量子化における SAM の効果を調査します。具体的には、最初に量子化ノイズと敵対的摂動をそれぞれモデルの重みに導入するものとして扱うことにより、量子化と SAM の統一されたビューを提供します。ノイズと摂動の項が相互に依存するかどうかに応じて、SAQ は 3 つのケースに定式化できます。これらは包括的に分析および比較されます。さらに、効率的なトレーニング戦略を導入することにより、SAQ は、デフォルトのオプティマイザー (SGD や AdamW など) と比較して、トレーニングのオーバーヘッドを少しだけ追加します。さまざまなデータセット (ImageNet、CIFAR-10/100、Oxford Flowers-102、Oxford-IIIT Pets など) にわたる畳み込みニューラルネットワークとトランスフォーマーの両方に関する広範な実験では、SAQ が量子化されたモデルの一般化パフォーマンスを向上させ、SOTA の結果が次のようになることが示されています。一様量子化。たとえば、ImageNet では、SAQ は 4 ビット ViT-B/16 のトップ 1 精度で AdamW を 1.2% 上回っています。当社の 4 ビット ResNet-50 は、トップ 1 精度で以前の SOTA メソッドを 0.9% 上回っています。

Network quantization is a dominant paradigm of model compression. However, the abrupt changes in quantized weights during training often lead to severe loss fluctuations and result in a sharp loss landscape, making the gradients unstable and thus degrading the performance. Recently, Sharpness-Aware Minimization (SAM) has been proposed to smooth the loss landscape and improve the generalization performance of the models. Nevertheless, directly applying SAM to the quantized models can lead to perturbation mismatch or diminishment issues, resulting in suboptimal performance. In this paper, we propose a novel method, dubbed Sharpness-Aware Quantization (SAQ), to explore the effect of SAM in model compression, particularly quantization for the first time. Specifically, we first provide a unified view of quantization and SAM by treating them as introducing quantization noises and adversarial perturbations to the model weights, respectively. According to whether the noise and perturbation terms depend on each other, SAQ can be formulated into three cases, which are analyzed and compared comprehensively. Furthermore, by introducing an efficient training strategy, SAQ only incurs a little additional training overhead compared with the default optimizer (e.g., SGD or AdamW). Extensive experiments on both convolutional neural networks and Transformers across various datasets (i.e., ImageNet, CIFAR-10/100, Oxford Flowers-102, Oxford-IIIT Pets) show that SAQ improves the generalization performance of the quantized models, yielding the SOTA results in uniform quantization. For example, on ImageNet, SAQ outperforms AdamW by 1.2% on the Top-1 accuracy for 4-bit ViT-B/16. Our 4-bit ResNet-50 surpasses the previous SOTA method by 0.9% on the Top-1 accuracy.

updated: Tue Mar 21 2023 10:31:46 GMT+0000 (UTC)

published: Wed Nov 24 2021 05:16:41 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト