DiVAE: Photorealistic Images Synthesis with Denoising Diffusion Decoder

Jie Shi; Chenfei Wu; Jian Liang; Xiang Liu; Nan Duan

DiVAE：ノイズ除去拡散デコーダーを使用したフォトリアリスティックな画像合成

最近最も成功している画像合成モデルは、さまざまな方法の利点を組み合わせる多段階プロセスです。これには、画像への埋め込みを忠実に再構築するためのVAEのようなモデルと、画像埋め込みを生成するための以前のモデルが常に含まれます。同時に、拡散モデルは高品質の合成画像を生成する能力を示しています。私たちの仕事は、画像合成の再構成コンポーネントとして機能する拡散デコーダー（DiVAE）を備えたVQ-VAEアーキテクチャモデルを提案します。優れたパフォーマンスを得るために拡散モデルに画像埋め込みを入力する方法を検討し、拡散のUNetを簡単に変更することでそれを実現できることを発見しました。 ImageNetでのトレーニングにより、私たちのモデルは最先端の結果を達成し、より写実的な画像を具体的に生成します。さらに、自己回帰ジェネレーターを備えたDiVAEを条件付き合成タスクに適用して、より人間味のある詳細なサンプルを実行します。

Recently most successful image synthesis models are multi stage process to combine the advantages of different methods, which always includes a VAE-like model for faithfully reconstructing embedding to image and a prior model to generate image embedding. At the same time, diffusion models have shown be capacity to generate high-quality synthetic images. Our work proposes a VQ-VAE architecture model with a diffusion decoder (DiVAE) to work as the reconstructing component in image synthesis. We explore how to input image embedding into diffusion model for excellent performance and find that simple modification on diffusion's UNet can achieve it. Training on ImageNet, Our model achieves state-of-the-art results and generates more photorealistic images specifically. In addition, we apply the DiVAE with an Auto-regressive generator on conditional synthesis tasks to perform more human-feeling and detailed samples.

updated: Wed Jun 01 2022 10:39:12 GMT+0000 (UTC)

published: Wed Jun 01 2022 10:39:12 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト