Fast Sampling of Diffusion Models via Operator Learning

Hongkai Zheng; Weili Nie; Arash Vahdat; Kamyar Azizzadenesheli; Anima Anandkumar

オペレーター学習による拡散モデルの高速サンプリング

普及モデルはさまざまな分野で広く採用されています。ただし、微分方程式で定義された連続プロセスをエミュレートするには数百から数千のネットワーク評価が必要なため、サンプリングプロセスは遅くなります。この研究では、確率流微分方程式を解く効率的な方法であるニューラル演算子を使用して、拡散モデルのサンプリングプロセスを高速化します。シーケンシャルな性質を持つ他の高速サンプリング方法と比較して、私たちは 1 つのモデル前方パスのみで画像を生成する並列デコード方法を初めて提案しました。初期条件、つまりガウス分布を逆拡散過程の連続時間解軌跡にマッピングする、ニューラルオペレーターを使用した拡散モデルサンプリング (DSNO) を提案します。軌道に沿った時間相関をモデル化するために、フーリエ空間でパラメータ化された時間畳み込み層を所定の拡散モデルバックボーンに導入します。私たちの方法が、1 モデル評価設定で CIFAR-10 の場合は 3.78、ImageNet-64 の場合は 7.83 という最先端の FID を達成することを示します。

Diffusion models have found widespread adoption in various areas. However, their sampling process is slow because it requires hundreds to thousands of network evaluations to emulate a continuous process defined by differential equations. In this work, we use neural operators, an efficient method to solve the probability flow differential equations, to accelerate the sampling process of diffusion models. Compared to other fast sampling methods that have a sequential nature, we are the first to propose a parallel decoding method that generates images with only one model forward pass. We propose diffusion model sampling with neural operator (DSNO) that maps the initial condition, i.e., Gaussian distribution, to the continuous-time solution trajectory of the reverse diffusion process. To model the temporal correlations along the trajectory, we introduce temporal convolution layers that are parameterized in the Fourier space into the given diffusion model backbone. We show our method achieves state-of-the-art FID of 3.78 for CIFAR-10 and 7.83 for ImageNet-64 in the one-model-evaluation setting.

updated: Sat Jul 22 2023 08:47:10 GMT+0000 (UTC)

published: Thu Nov 24 2022 07:30:27 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト