Post-training Quantization on Diffusion Models

Yuzhang Shang; Zhihang Yuan; Bin Xie; Bingzhe Wu; Yan Yan

拡散モデルのトレーニング後の量子化

ノイズ除去拡散 (スコアベース) 生成モデルは、最近、現実的で多様なデータの生成において重要な成果を達成しました。これらのアプローチは、データをノイズに変換するための前方拡散プロセスと、ノイズからデータをサンプリングするための後方ノイズ除去プロセスを定義します。残念ながら、現在のノイズ除去拡散モデルの生成プロセスは、厄介なニューラルネットワークに依存する長時間の反復ノイズ推定により、非常に遅いことで有名です。特にエッジデバイスで、拡散モデルが広く展開されるのを防ぎます。以前の作品は、より短いが効果的なサンプリング軌跡を見つけることにより、拡散モデル (DM) の生成プロセスを加速します。ただし、反復ごとに重いネットワークを使用したノイズ推定のコストを見落としています。この作業では、ノイズ推定ネットワークを圧縮するという観点から生成を高速化します。 DM の再トレーニングが難しいため、主流のトレーニング対応圧縮パラダイムを除外し、トレーニング後の量子化 (PTQ) を DM アクセラレーションに導入します。ただし、ノイズ推定ネットワークの出力分布は時間ステップで変化し、単一時間ステップのシナリオ用に設計されているため、以前の PTQ メソッドは DM で失敗します。 DM 固有の PTQ メソッドを考案するために、量子化された操作、キャリブレーションデータセット、およびキャリブレーションメトリックの 3 つの側面で DM の PTQ を調べます。包括的な調査から得られたいくつかの観察結果を要約して使用して、特にDMのユニークなマルチタイムステップ構造を対象とする方法を策定します。実験的に、私たちの方法は、トレーニングなしでパフォーマンスを維持または改善しながら、完全精度の DM を 8 ビットモデルに直接量子化できます。重要なことに、この方法は、DDIM などの他の高速サンプリング方法のプラグアンドプレイモジュールとして機能します。

Denoising diffusion (score-based) generative models have recently achieved significant accomplishments in generating realistic and diverse data. These approaches define a forward diffusion process for transforming data into noise and a backward denoising process for sampling data from noise. Unfortunately, the generation process of current denoising diffusion models is notoriously slow due to the lengthy iterative noise estimations, which rely on cumbersome neural networks. It prevents the diffusion models from being widely deployed, especially on edge devices. Previous works accelerate the generation process of diffusion model (DM) via finding shorter yet effective sampling trajectories. However, they overlook the cost of noise estimation with a heavy network in every iteration. In this work, we accelerate generation from the perspective of compressing the noise estimation network. Due to the difficulty of retraining DMs, we exclude mainstream training-aware compression paradigms and introduce post-training quantization (PTQ) into DM acceleration. However, the output distributions of noise estimation networks change with time-step, making previous PTQ methods fail in DMs since they are designed for single-time step scenarios. To devise a DM-specific PTQ method, we explore PTQ on DM in three aspects: quantized operations, calibration dataset, and calibration metric. We summarize and use several observations derived from all-inclusive investigations to formulate our method, which especially targets the unique multi-time-step structure of DMs. Experimentally, our method can directly quantize full-precision DMs into 8-bit models while maintaining or even improving their performance in a training-free manner. Importantly, our method can serve as a plug-and-play module on other fast-sampling methods, e.g., DDIM.

updated: Mon Nov 28 2022 19:33:39 GMT+0000 (UTC)

published: Mon Nov 28 2022 19:33:39 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト