Improved Vector Quantized Diffusion Models

Zhicong Tang; Shuyang Gu; Jianmin Bao; Dong Chen; Fang Wen

改善されたベクトル量子化拡散モデル

ベクトル量子化拡散（VQ-Diffusion）は、テキストから画像への合成のための強力な生成モデルですが、テキスト入力を使用して低品質のサンプルや弱く相関する画像を生成できる場合があります。これらの問題は、主に欠陥のあるサンプリング戦略が原因であることがわかります。この論文では、VQ拡散のサンプル品質をさらに改善するための2つの重要な手法を提案します。 1）離散ノイズ除去拡散モデルの分類器なしのガイダンスサンプリングを調査し、分類器なしのガイダンスのより一般的で効果的な実装を提案します。 2）VQ-Diffusionにおける同時分布の問題を軽減するための高品質の推論戦略を提示します。最後に、さまざまなデータセットで実験を行い、それらの有効性を検証し、改善されたVQ-Diffusionがバニラバージョンを大幅に抑制することを示します。 MSCOCOで8.44FIDスコアを達成し、VQ-Diffusionを5.42FIDスコア上回っています。 ImageNetでトレーニングすると、FIDスコアが11.89から4.83に劇的に向上し、提案された手法の優位性が実証されました。

Vector quantized diffusion (VQ-Diffusion) is a powerful generative model for text-to-image synthesis, but sometimes can still generate low-quality samples or weakly correlated images with text input. We find these issues are mainly due to the flawed sampling strategy. In this paper, we propose two important techniques to further improve the sample quality of VQ-Diffusion. 1) We explore classifier-free guidance sampling for discrete denoising diffusion model and propose a more general and effective implementation of classifier-free guidance. 2) We present a high-quality inference strategy to alleviate the joint distribution issue in VQ-Diffusion. Finally, we conduct experiments on various datasets to validate their effectiveness and show that the improved VQ-Diffusion suppresses the vanilla version by large margins. We achieve an 8.44 FID score on MSCOCO, surpassing VQ-Diffusion by 5.42 FID score. When trained on ImageNet, we dramatically improve the FID score from 11.89 to 4.83, demonstrating the superiority of our proposed techniques.

updated: Tue May 31 2022 17:59:53 GMT+0000 (UTC)

published: Tue May 31 2022 17:59:53 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト