Variational Diffusion Auto-encoder: Latent Space Extraction from Pre-trained Diffusion Models

Georgios Batzolis; Jan Stanczuk; Carola-Bibiane Schönlieb

変分拡散オートエンコーダ: 事前トレーニングされた拡散モデルからの潜在空間抽出

ディープジェネレーティブモデリングへの広く認識されたアプローチとして、変分オートエンコーダー (VAE) は依然として、生成された画像の品質に関する課題に直面しており、多くの場合、顕著な不鮮明さが発生します。この問題は、条件付きデータ分布 p(x | z) を等方性ガウスとして近似するという非現実的な仮定に起因します。このホワイトペーパーでは、これらの問題に対処するための新しいソリューションを提案します。エンコーダを最適化して周辺データの対数尤度を最大化することで、既存の拡散モデルから潜在空間を抽出する方法を説明します。さらに、スコアにベイズ規則を使用して、エンコーダのトレーニング後にデコーダを分析的に導出できることを示します。これにより、VAE 風の深い潜在変数モデルが得られ、p(x | z) のガウス仮定や別個のデコーダネットワークのトレーニングの必要性がなくなりました。事前トレーニングされた拡散モデルの長所を利用し、潜在空間を装備する私たちの方法は、VAE のパフォーマンスを大幅に向上させます。

As a widely recognized approach to deep generative modeling, Variational Auto-Encoders (VAEs) still face challenges with the quality of generated images, often presenting noticeable blurriness. This issue stems from the unrealistic assumption that approximates the conditional data distribution, p(x | z), as an isotropic Gaussian. In this paper, we propose a novel solution to address these issues. We illustrate how one can extract a latent space from a pre-existing diffusion model by optimizing an encoder to maximize the marginal data log-likelihood. Furthermore, we demonstrate that a decoder can be analytically derived post encoder-training, employing the Bayes rule for scores. This leads to a VAE-esque deep latent variable model, which discards the need for Gaussian assumptions on p(x | z) or the training of a separate decoder network. Our method, which capitalizes on the strengths of pre-trained diffusion models and equips them with latent spaces, results in a significant enhancement to the performance of VAEs.

updated: Thu May 18 2023 22:44:12 GMT+0000 (UTC)

published: Mon Apr 24 2023 14:44:47 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト