Elucidating the Design Space of Diffusion-Based Generative Models

Tero Karras; Miika Aittala; Timo Aila; Samuli Laine

拡散ベースの生成モデルの設計空間の解明

拡散ベースの生成モデルの理論と実践は現在、不必要に複雑であり、具体的な設計の選択肢を明確に分離する設計空間を提示することで状況を改善しようとしています。これにより、サンプリングプロセスとトレーニングプロセスの両方、およびスコアネットワークの前処理に対するいくつかの変更を特定できます。これらの改善により、クラス条件付き設定で CIFAR-10 の 1.79、無条件設定で 1.97 という新しい最先端の FID が得られ、以前の設計よりもはるかに高速なサンプリング (画像ごとに 35 回のネットワーク評価) が実現します。モジュールの性質をさらに実証するために、以前のトレーニング済みの ImageNet-64 モデルの FID を 2.07 からほぼ SOTA 1.55 に改善するなど、以前の作業から事前トレーニング済みのスコアネットワークで得られる効率と品質の両方が、設計の変更によって劇的に改善されることを示します。、および 1.36 の新しい SOTA に対する提案された改善を使用して再トレーニングした後。

We argue that the theory and practice of diffusion-based generative models are currently unnecessarily convoluted and seek to remedy the situation by presenting a design space that clearly separates the concrete design choices. This lets us identify several changes to both the sampling and training processes, as well as preconditioning of the score networks. Together, our improvements yield new state-of-the-art FID of 1.79 for CIFAR-10 in a class-conditional setting and 1.97 in an unconditional setting, with much faster sampling (35 network evaluations per image) than prior designs. To further demonstrate their modular nature, we show that our design changes dramatically improve both the efficiency and quality obtainable with pre-trained score networks from previous work, including improving the FID of a previously trained ImageNet-64 model from 2.07 to near-SOTA 1.55, and after re-training with our proposed improvements to a new SOTA of 1.36.

updated: Tue Oct 11 2022 13:20:30 GMT+0000 (UTC)

published: Wed Jun 01 2022 10:03:24 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト