Generating Diverse Structure for Image Inpainting With Hierarchical VQ-VAE

Jialun Peng; Dong Liu; Songcen Xu; Houqiang Li

階層VQ-VAEを使用した画像修復のための多様な構造の生成

追加の制約のない不完全な画像を考えると、画像の修復は、それらがもっともらしいと思われる限り、ネイティブに複数の解決策を可能にします。最近、マルチソリューション修復法が提案され、多様な結果を生み出す可能性が示されています。ただし、これらの方法では、各ソリューションの品質を保証するのが困難です。たとえば、歪んだ構造やぼやけたテクスチャが生成されます。多様な修復のための2段階モデルを提案します。最初の段階では、それぞれが異なる構造を持つ複数の粗い結果を生成し、2番目の段階ではテクスチャを拡張することによって各粗い結果を個別に改良します。提案されたモデルは、階層的ベクトル量子化変分オートエンコーダー（VQ-VAE）に触発されており、その階層的アーキテクチャーは構造情報とテクスチャー情報を絡み合わせています。さらに、VQVAEのベクトル量子化により、構造情報の離散分布の自己回帰モデリングが可能になります。分布からのサンプリングにより、多様で高品質の構造を簡単に生成でき、モデルの最初の段階を構成します。第2段階では、テクスチャ生成ネットワーク内の構造的注意モジュールを提案します。このモジュールでは、構造情報を利用して離れた相関関係をキャプチャします。さらに、VQ-VAEを再利用して、2つの特徴損失を計算します。これにより、構造の一貫性とテクスチャのリアリズムがそれぞれ向上します。 CelebA-HQ、Places2、およびImageNetデータセットの実験結果は、私たちの方法が修復ソリューションの多様性を高めるだけでなく、生成された複数の画像の視覚的品質も向上させることを示しています。コードとモデルは、https：//github.com/USTC-JialunPeng/Diverse-Structure-Inpaintingで入手できます。

Given an incomplete image without additional constraint, image inpainting natively allows for multiple solutions as long as they appear plausible. Recently, multiplesolution inpainting methods have been proposed and shown the potential of generating diverse results. However, these methods have difficulty in ensuring the quality of each solution, e.g. they produce distorted structure and/or blurry texture. We propose a two-stage model for diverse inpainting, where the first stage generates multiple coarse results each of which has a different structure, and the second stage refines each coarse result separately by augmenting texture. The proposed model is inspired by the hierarchical vector quantized variational auto-encoder (VQ-VAE), whose hierarchical architecture isentangles structural and textural information. In addition, the vector quantization in VQVAE enables autoregressive modeling of the discrete distribution over the structural information. Sampling from the distribution can easily generate diverse and high-quality structures, making up the first stage of our model. In the second stage, we propose a structural attention module inside the texture generation network, where the module utilizes the structural information to capture distant correlations. We further reuse the VQ-VAE to calculate two feature losses, which help improve structure coherence and texture realism, respectively. Experimental results on CelebA-HQ, Places2, and ImageNet datasets show that our method not only enhances the diversity of the inpainting solutions but also improves the visual quality of the generated multiple images. Code and models are available at: https://github.com/USTC-JialunPeng/Diverse-Structure-Inpainting.

updated: Thu Mar 18 2021 05:10:49 GMT+0000 (UTC)

published: Thu Mar 18 2021 05:10:49 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト