Self-Supervised Geometry-Aware Encoder for Style-Based 3D GAN Inversion

Yushi Lan; Xuyi Meng; Shuai Yang; Chen Change Loy; Bo Dai

スタイルベースの 3D GAN 反転のための自己管理型ジオメトリ認識エンコーダー

StyleGAN は、2D の顔の再構成と、画像反転と潜在編集によるセマンティック編集において大きな進歩を遂げました。 2D StyleGAN を 3D 顔に拡張する研究が行われていますが、対応する一般的な 3D GAN 反転フレームワークがまだ不足しており、3D 顔の再構成とセマンティック編集のアプリケーションが制限されています。この論文では、3D形状と詳細なテクスチャを忠実に復元するために、単一の顔画像が与えられた場合に潜在コードが予測される3D GAN反転の挑戦的な問題を研究します。問題は不適切です: 形状とテクスチャの無数の構成が現在の画像にレンダリングされる可能性があります。さらに、グローバルな潜在コードの容量が限られているため、2D インバージョンメソッドを 3D モデルに適用すると、忠実な形状とテクスチャを同時に維持することはできません。この問題を解決するために、反転の学習を制限する効果的な自己訓練スキームを考案します。学習は、現実世界の 2D-3D トレーニングペアを使用せずに効率的に行われますが、3D GAN から生成されたプロキシサンプルが使用されます。さらに、大まかな形状とテクスチャ情報をキャプチャするグローバルな潜在コードとは別に、生成ネットワークをローカルブランチで拡張します。このブランチでは、顔の詳細を忠実に再構築するためにピクセルアラインメント機能が追加されます。さらに、3D ビューの一貫性のある編集を実行するための新しいパイプラインを検討します。広範な実験により、私たちの方法が形状とテクスチャ再構成の両方の品質において最先端の反転方法よりも優れていることが示されています。コードとデータが公開されます。

StyleGAN has achieved great progress in 2D face reconstruction and semantic editing via image inversion and latent editing. While studies over extending 2D StyleGAN to 3D faces have emerged, a corresponding generic 3D GAN inversion framework is still missing, limiting the applications of 3D face reconstruction and semantic editing. In this paper, we study the challenging problem of 3D GAN inversion where a latent code is predicted given a single face image to faithfully recover its 3D shapes and detailed textures. The problem is ill-posed: innumerable compositions of shape and texture could be rendered to the current image. Furthermore, with the limited capacity of a global latent code, 2D inversion methods cannot preserve faithful shape and texture at the same time when applied to 3D models. To solve this problem, we devise an effective self-training scheme to constrain the learning of inversion. The learning is done efficiently without any real-world 2D-3D training pairs but proxy samples generated from a 3D GAN. In addition, apart from a global latent code that captures the coarse shape and texture information, we augment the generation network with a local branch, where pixel-aligned features are added to faithfully reconstruct face details. We further consider a new pipeline to perform 3D view-consistent editing. Extensive experiments show that our method outperforms state-of-the-art inversion methods in both shape and texture reconstruction quality. Code and data will be released.

updated: Wed Dec 14 2022 18:49:50 GMT+0000 (UTC)

published: Wed Dec 14 2022 18:49:50 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト