Diffusion Probabilistic Model Made Slim

Xingyi Yang; Daquan Zhou; Jiashi Feng; Xinchao Wang

スリム化された拡散確率モデル

最近達成された視覚的に満足できる結果にもかかわらず、莫大な計算コストは拡散確率モデル (DPM) の長年の欠点であり、リソースが限られたプラットフォームでのアプリケーションを大幅に制限しています。ただし、効率的な DPM に向けた従来の方法は、主にテストの高速化に重点を置いていましたが、その巨大な複雑さとサイズを見落としていました。このホワイトペーパーでは、DPM の良好なパフォーマンスを維持するために努力しながら、DPM を軽量化することに専念します。小規模な潜在拡散モデル (LDM) をゼロからトレーニングすることから始めますが、合成画像の忠実度が大幅に低下することが観察されます。徹底的な評価を通じて、DPM は高周波生成に対して本質的にバイアスがかかっており、異なる時間ステップで異なる周波数成分を回復することを学習することがわかりました。これらのプロパティにより、コンパクトなネットワークは正確な高周波推定で周波数ダイナミクスを表すことができなくなります。この目的に向けて、軽量の画像合成のために、スペクトル拡散 (SD) と呼ばれるスリム DPM のカスタマイズされた設計を導入します。 SD はそのアーキテクチャにウェーブレットゲーティングを組み込み、すべての逆ステップで周波数の動的特徴抽出を可能にし、スペクトルを意識した蒸留を行って、スペクトルの大きさに基づいて目的を逆重み付けすることで高周波回復を促進します。実験結果は、競合する画像の忠実度を維持しながら、一連の条件付きおよび無条件の画像生成タスクで潜在拡散モデルと比較して、SD が 8 ～ 18 倍の計算複雑さの削減を達成することを示しています。

Despite the recent visually-pleasing results achieved, the massive computational cost has been a long-standing flaw for diffusion probabilistic models (DPMs), which, in turn, greatly limits their applications on resource-limited platforms. Prior methods towards efficient DPM, however, have largely focused on accelerating the testing yet overlooked their huge complexity and sizes. In this paper, we make a dedicated attempt to lighten DPM while striving to preserve its favourable performance. We start by training a small-sized latent diffusion model (LDM) from scratch, but observe a significant fidelity drop in the synthetic images. Through a thorough assessment, we find that DPM is intrinsically biased against high-frequency generation, and learns to recover different frequency components at different time-steps. These properties make compact networks unable to represent frequency dynamics with accurate high-frequency estimation. Towards this end, we introduce a customized design for slim DPM, which we term as Spectral Diffusion (SD), for light-weight image synthesis. SD incorporates wavelet gating in its architecture to enable frequency dynamic feature extraction at every reverse steps, and conducts spectrum-aware distillation to promote high-frequency recovery by inverse weighting the objective based on spectrum magni tudes. Experimental results demonstrate that, SD achieves 8-18x computational complexity reduction as compared to the latent diffusion models on a series of conditional and unconditional image generation tasks while retaining competitive image fidelity.

updated: Sun Nov 27 2022 16:27:28 GMT+0000 (UTC)

published: Sun Nov 27 2022 16:27:28 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト