Sharpness-aware Quantization for Deep Neural Networks

Jing Liu; Jianfei Cai; Bohan Zhuang

深層ニューラルネットワークのシャープネス認識量子化

ネットワーク量子化は、モデルのサイズと計算オーバーヘッドを大幅に削減できるため、ますます注目を集めています。ただし、量子化の離散的な性質により、完全精度の重みの小さな変化が量子化された重みの大きな変化を引き起こす可能性があり、これが深刻な損失変動につながり、その結果、急激な損失状況が生じます。変動する損失により、トレーニング中に勾配が不安定になり、パフォーマンスが大幅に低下します。最近、Sharpness-Aware Minimization (SAM) が提案され、損失状況を平滑化し、モデルの一般化パフォーマンスを向上させました。それにもかかわらず、SAM を量子化されたモデルに合わせてカスタマイズする方法は、量子化におけるクリッピングと離散化の影響により、自明ではありません。このホワイトペーパーでは、シャープネス認識量子化（SAQ）と呼ばれる新しい方法を提案して、損失状況を平滑化し、量子化されたモデルの一般化パフォーマンスを改善します。これは、モデル圧縮、特に量子化におけるSAMの効果を初めて調査します。 .具体的には、最初に量子化と SAM の統一されたビューを提案します。ここでは、モデルの重みに量子化ノイズと敵対的摂動を導入すると見なします。量子化ノイズと敵対的摂動が相互に依存するかどうかに応じて、SAQ は 3 つのケースに分けることができます。次に、さまざまなケースを包括的に分析および比較します。畳み込みニューラルネットワークとトランスフォーマーの両方に関する広範な実験では、SAQ が量子化されたモデルの汎化パフォーマンスを向上させ、均一な量子化で SOTA の結果が得られることが示されています。たとえば、ImageNet では、SAQ は従来の最適化手順 (すなわち SGD) でトレーニングされたモデルよりも、4 ビット ResNet-50 のトップ 1 精度で 1.1% 優れています。コードは https://github.com/zip-group/SAQ で入手できます。

Network quantization has gained increasing attention since it can significantly reduce the model size and computational overhead. However, due to the discrete nature of quantization, a small change in full-precision weights might incur large change in quantized weights, which leads to severe loss fluctuations and thus results in the sharp loss landscape. The fluctuating loss makes the gradients unstable during training, resulting in considerable performance degradation. Recently, Sharpness-Aware Minimization (SAM) has been proposed to smooth the loss landscape and improve the generalization performance of the models. Nevertheless, how to customize SAM to the quantized models is non-trivial due to the effect of the clipping and discretization in quantization. In this paper, we propose a novel method, dubbed Sharpness-Aware Quantization (SAQ), to smooth the loss landscape and improve the generalization performance of the quantized models, which explores the effect of SAM in model compression, particularly quantization for the first time. Specifically, we first propose a unified view for quantization and SAM, where we consider them as introducing quantization noises and adversarial perturbations to the model weights. According to whether the quantization noises and adversarial perturbations depend on each other, SAQ can be divided into three cases. We then analyze and compare different cases comprehensively. Extensive experiments on both convolutional neural networks and Transformers show that SAQ improves the generalization performance of the quantized models, yielding the SOTA results in uniform quantization. For example, on ImageNet, our SAQ outperforms the model trained with the conventional optimization procedure (i.e., SGD) by 1.1% on the Top-1 accuracy on 4-bit ResNet-50. Code is available at https://github.com/zip-group/SAQ.

updated: Sat Oct 01 2022 00:59:20 GMT+0000 (UTC)

published: Wed Nov 24 2021 05:16:41 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト