NVAE: A Deep Hierarchical Variational Autoencoder

Arash Vahdat; Jan Kautz

NVAE：ディープ階層型変分オートエンコーダ

正規化フロー、自己回帰モデル、変分オートエンコーダー（VAE）、および深層エネルギーベースのモデルは、深層生成学習のための競合する尤度ベースのフレームワークの1つです。その中でも、VAEには、高速で扱いやすいサンプリングと、アクセスしやすいエンコーディングネットワークという利点があります。ただし、現在、フローの正規化や自己回帰モデルなどの他のモデルよりもパフォーマンスが優れています。 VAEの研究の大部分は統計的な課題に焦点を当てていますが、階層型VAEのニューラルアーキテクチャを注意深く設計するという直交する方向を探っています。 Nouveau VAE（NVAE）を提案します。これは、深度ごとに分離可能な畳み込みとバッチ正規化を使用して画像を生成するために構築された、深い階層型VAEです。 NVAEは、正規分布の残差パラメーター化を備えており、そのトレーニングはスペクトル正則化によって安定化されます。 NVAEが、MNIST、CIFAR-10、CelebA 64、およびCelebA HQデータセットの非自己回帰尤度ベースのモデル間で最先端の結果を達成し、FFHQの強力なベースラインを提供することを示します。たとえば、CIFAR-10では、NVAEは最先端技術を次元あたり2.98ビットから2.91ビットにプッシュし、CelebAHQで高品質の画像を生成します。私たちの知る限り、NVAEは256×256ピクセルの自然画像に適用された最初の成功したVAEです。ソースコードはhttps://github.com/NVlabs/NVAEで入手できます。

Normalizing flows, autoregressive models, variational autoencoders (VAEs), and deep energy-based models are among competing likelihood-based frameworks for deep generative learning. Among them, VAEs have the advantage of fast and tractable sampling and easy-to-access encoding networks. However, they are currently outperformed by other models such as normalizing flows and autoregressive models. While the majority of the research in VAEs is focused on the statistical challenges, we explore the orthogonal direction of carefully designing neural architectures for hierarchical VAEs. We propose Nouveau VAE (NVAE), a deep hierarchical VAE built for image generation using depth-wise separable convolutions and batch normalization. NVAE is equipped with a residual parameterization of Normal distributions and its training is stabilized by spectral regularization. We show that NVAE achieves state-of-the-art results among non-autoregressive likelihood-based models on the MNIST, CIFAR-10, CelebA 64, and CelebA HQ datasets and it provides a strong baseline on FFHQ. For example, on CIFAR-10, NVAE pushes the state-of-the-art from 2.98 to 2.91 bits per dimension, and it produces high-quality images on CelebA HQ. To the best of our knowledge, NVAE is the first successful VAE applied to natural images as large as 256×256 pixels. The source code is available at https://github.com/NVlabs/NVAE .

updated: Fri Jan 08 2021 03:08:58 GMT+0000 (UTC)

published: Wed Jul 08 2020 04:56:56 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト