Delving StyleGAN Inversion for Image Editing: A Foundation Latent Space Viewpoint

Hongyu Liu; Yibing Song; Qifeng Chen

画像編集のための StyleGAN インバージョンの詳細: 基盤となる潜在空間の視点

StyleGAN を介した GAN の反転と編集は、入力画像を埋め込みスペース (W、W^+、および F) にマップして、画像の忠実度と意味のある操作を同時に維持します。 StyleGANの潜在空間Wから拡張潜在空間W^+、特徴空間Fまで、GAN反転の編集可能性は低下しますが、再構成品質は向上します。最近の GAN インバージョン手法では、通常、W ではなく W^+ と F を探索して、編集可能性を維持しながら再構成の忠実度を向上させます。 W^+ と F は本質的に StyleGAN の基礎となる潜在空間である W から派生するため、W^+ と F 空間に焦点を当てたこれらの GAN 反転メソッドは、W に戻ることで改善される可能性があります。基礎潜在空間 W の正確な潜在コード。正確な潜在コード発見のために、W と画像空間を整列させるための対照学習を導入します。取得プロセスは、対照的な学習を使用して、W と画像空間を揃えることによって行われます。次に、クロスアテンションエンコーダーを活用して、取得した W の潜在コードを W^+ と F に適宜変換します。私たちの実験は、基礎となる潜在空間 W の調査により、W^+ の潜在コードと F の機能の表現能力が向上し、標準的なベンチマークで最先端の再構成の忠実度と編集可能性の結果が得られることを示しています。プロジェクトページ: https://github.com/KumapowerLIU/CLCAE.

GAN inversion and editing via StyleGAN maps an input image into the embedding spaces (W, W^+, and F) to simultaneously maintain image fidelity and meaningful manipulation. From latent space W to extended latent space W^+ to feature space F in StyleGAN, the editability of GAN inversion decreases while its reconstruction quality increases. Recent GAN inversion methods typically explore W^+ and F rather than W to improve reconstruction fidelity while maintaining editability. As W^+ and F are derived from W that is essentially the foundation latent space of StyleGAN, these GAN inversion methods focusing on W^+ and F spaces could be improved by stepping back to W. In this work, we propose to first obtain the precise latent code in foundation latent space W. We introduce contrastive learning to align W and the image space for precise latent code discovery. %The obtaining process is by using contrastive learning to align W and the image space. Then, we leverage a cross-attention encoder to transform the obtained latent code in W into W^+ and F, accordingly. Our experiments show that our exploration of the foundation latent space W improves the representation ability of latent codes in W^+ and features in F, which yields state-of-the-art reconstruction fidelity and editability results on the standard benchmarks. Project page: https://github.com/KumapowerLIU/CLCAE.

updated: Fri Mar 10 2023 05:45:28 GMT+0000 (UTC)

published: Mon Nov 21 2022 13:35:32 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト