PTQD: Accurate Post-Training Quantization for Diffusion Models

Yefei He; Luping Liu; Jing Liu; Weijia Wu; Hong Zhou; Bohan Zhuang

PTQD: 拡散モデルの正確なポストトレーニング量子化

拡散モデルは最近、画像合成やその他の関連する生成タスクで主流を占めています。ただし、反復ノイズ除去プロセスは推論時の計算にコストがかかるため、低遅延でスケーラブルな現実世界のアプリケーションにとって拡散モデルはあまり実用的ではありません。拡散モデルのトレーニング後の量子化により、再トレーニングを必要とせずにモデルサイズを大幅に削減し、サンプリングプロセスを高速化できます。それにもかかわらず、既存のトレーニング後の量子化手法を低ビット拡散モデルに直接適用すると、生成されたサンプルの品質が大幅に損なわれる可能性があります。具体的には、ノイズ除去ステップごとに、量子化ノイズにより推定平均値に偏差が生じ、所定の分散スケジュールとの不一致が生じます。さらに、サンプリングプロセスが進行するにつれて、量子化ノイズが蓄積する可能性があり、その結果、後のノイズ除去ステップで信号対ノイズ比 (SNR) が低下します。これらの課題に対処するために、量子化ノイズ除去プロセスにおける量子化ノイズと拡散摂動ノイズの統一定式化を提案します。まず、量子化ノイズを、完全精度の対応物に関して相関のある部分と相関のない残りの部分に分解します。相関係数を推定することで、相関のある部分を簡単に補正できます。相関のない部分については、量子化から生じる過剰な分散を吸収するためにノイズ除去分散スケジュールを調整します。さらに、各ノイズ除去ステップに最適なビット幅を選択するための混合精度スキームを提案します。これは、初期のノイズ除去ステップを加速するために低ビットを優先する一方、後半のステップでは高い SNR を維持するために高ビットを優先します。広範な実験により、私たちの方法は高品質サンプルの生成において以前のトレーニング後の量子化拡散モデルよりも優れており、ImageNet 256x256 上のフル精度 LDM-4 と比較して FID スコアの増加はわずか 0.06 であり、19.9 倍のビット演算を節約できることが実証されました。

Diffusion models have recently dominated image synthesis and other related generative tasks. However, the iterative denoising process is expensive in computations at inference time, making diffusion models less practical for low-latency and scalable real-world applications. Post-training quantization of diffusion models can significantly reduce the model size and accelerate the sampling process without requiring any re-training. Nonetheless, applying existing post-training quantization methods directly to low-bit diffusion models can significantly impair the quality of generated samples. Specifically, for each denoising step, quantization noise leads to deviations in the estimated mean and mismatches with the predetermined variance schedule. Moreover, as the sampling process proceeds, the quantization noise may accumulate, resulting in a low signal-to-noise ratio (SNR) in late denoising steps. To address these challenges, we propose a unified formulation for the quantization noise and diffusion perturbed noise in the quantized denoising process. We first disentangle the quantization noise into its correlated and residual uncorrelated parts regarding its full-precision counterpart. The correlated part can be easily corrected by estimating the correlation coefficient. For the uncorrelated part, we calibrate the denoising variance schedule to absorb the excess variance resulting from quantization. Moreover, we propose a mixed-precision scheme to choose the optimal bitwidth for each denoising step, which prefers low bits to accelerate the early denoising steps while high bits maintain the high SNR for the late steps. Extensive experiments demonstrate that our method outperforms previous post-training quantized diffusion models in generating high-quality samples, with only a 0.06 increase in FID score compared to full-precision LDM-4 on ImageNet 256x256, while saving 19.9x bit operations.

updated: Thu Jun 08 2023 01:26:05 GMT+0000 (UTC)

published: Thu May 18 2023 02:28:42 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト