Hierarchical Autoregressive Image Models with Auxiliary Decoders

Jeffrey De Fauw; Sander Dieleman; Karen Simonyan

補助デコーダーを使用した階層的自己回帰画像モデル

画像の自己回帰生成モデルは、局所構造のキャプチャに偏っている傾向があり、その結果、大規模なコヒーレンスが不足しているサンプルを生成することがよくあります。これに対処するために、ローカルの詳細を抽象化する画像の離散表現を学習する2つの方法を提案します。これらの表現に条件付けられた自己回帰モデルは、画像の忠実度の高い再構築を生成でき、大規模なコヒーレンスを持つサンプルを生成するこれらの表現に自己回帰事前分布を訓練できることを示します。学習手順を再帰的に適用して、徐々に抽象的な画像表現の階層を生成できます。 ImageNetデータセットで階層的なクラス条件付き自己回帰モデルをトレーニングし、128 $ \ times $ 128および256 $ \ times $ 256ピクセルの解像度で現実的な画像を生成できることを示します。また、我々のモデルと敵対的および尤度ベースの最先端の生成モデルの両方を比較する人間評価研究を実施します。

Autoregressive generative models of images tend to be biased towards capturing local structure, and as a result they often produce samples which are lacking in terms of large-scale coherence. To address this, we propose two methods to learn discrete representations of images which abstract away local detail. We show that autoregressive models conditioned on these representations can produce high-fidelity reconstructions of images, and that we can train autoregressive priors on these representations that produce samples with large-scale coherence. We can recursively apply the learning procedure, yielding a hierarchy of progressively more abstract image representations. We train hierarchical class-conditional autoregressive models on the ImageNet dataset and demonstrate that they are able to generate realistic images at resolutions of 128$\times$128 and 256$\times$256 pixels. We also perform a human evaluation study comparing our models with both adversarial and likelihood-based state-of-the-art generative models.

updated: Tue Oct 08 2019 17:55:59 GMT+0000 (UTC)

published: Wed Mar 06 2019 22:13:52 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト