Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them on Images

Rewon Child

非常に深いVAEは自己回帰モデルを一般化し、画像上でそれらを上回ることができます

すべての自然画像ベンチマークで対数尤度でPixelCNNを上回りながら、初めてサンプルを迅速に生成する階層型VAEを紹介します。理論的には、VAEは実際に自己回帰モデルを表すことができ、十分に深くすると、より速く、より良いモデルが存在する場合はそれを表すことができることを観察することから始めます。これにもかかわらず、自己回帰モデルは、対数尤度でVAEを歴史的に上回っています。深さが不十分であることが理由を説明するかどうかをテストするために、VAEを以前に調査したよりも大きな確率的深さにスケーリングし、CIFAR-10、ImageNet、およびFFHQを評価します。 PixelCNNと比較して、これらの非常に深いVAEは、より高い可能性を達成し、より少ないパラメーターを使用し、数千倍速くサンプルを生成し、高解像度画像により簡単に適用されます。定性的研究は、これはVAEが効率的な階層的視覚表現を学習するためであることを示唆しています。ソースコードとモデルはhttps://github.com/openai/vdvaeでリリースされています。

We present a hierarchical VAE that, for the first time, generates samples quickly while outperforming the PixelCNN in log-likelihood on all natural image benchmarks. We begin by observing that, in theory, VAEs can actually represent autoregressive models, as well as faster, better models if they exist, when made sufficiently deep. Despite this, autoregressive models have historically outperformed VAEs in log-likelihood. We test if insufficient depth explains why by scaling a VAE to greater stochastic depth than previously explored and evaluating it CIFAR-10, ImageNet, and FFHQ. In comparison to the PixelCNN, these very deep VAEs achieve higher likelihoods, use fewer parameters, generate samples thousands of times faster, and are more easily applied to high-resolution images. Qualitative studies suggest this is because the VAE learns efficient hierarchical visual representations. We release our source code and models at https://github.com/openai/vdvae.

updated: Tue Mar 16 2021 18:33:19 GMT+0000 (UTC)

published: Fri Nov 20 2020 21:35:31 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト