Optimizing Hierarchical Image VAEs for Sample Quality

Eric Luhman; Troy Luhman

サンプル品質のための階層画像 VAE の最適化

階層的変分オートエンコーダー (VAE) は、画像モデリングタスクで優れた密度推定を達成しましたが、以前のサンプルは、同様の対数尤度を持つモデルよりも説得力がないように見える傾向があります。これは、画像の知覚できない詳細を圧縮することを過度に強調する学習された表現に起因すると考えられます。これに対処するために、KL 再重み付け戦略を導入して各潜在グループの情報量を制御し、ガウス出力レイヤーを使用して学習目標のシャープネスを減らします。画像の多様性と忠実度をトレードオフするために、階層型 VAE に分類子を使用しないガイダンス戦略を追加で導入します。実験でこれらの手法の有効性を示します。コードは https://github.com/tcl9876/visual-vae で入手できます。

While hierarchical variational autoencoders (VAEs) have achieved great density estimation on image modeling tasks, samples from their prior tend to look less convincing than models with similar log-likelihood. We attribute this to learned representations that over-emphasize compressing imperceptible details of the image. To address this, we introduce a KL-reweighting strategy to control the amount of infor mation in each latent group, and employ a Gaussian output layer to reduce sharpness in the learning objective. To trade off image diversity for fidelity, we additionally introduce a classifier-free guidance strategy for hierarchical VAEs. We demonstrate the effectiveness of these techniques in our experiments. Code is available at https://github.com/tcl9876/visual-vae.

updated: Tue Oct 18 2022 23:10:58 GMT+0000 (UTC)

published: Tue Oct 18 2022 23:10:58 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト