Next3D: Generative Neural Texture Rasterization for 3D-Aware Head Avatars

Jingxiang Sun; Xuan Wang; Lizhen Wang; Xiaoyu Li; Yong Zhang; Hongwen Zhang; Yebin Liu

Next3D: 3D 認識ヘッドアバター用のジェネレーティブニューラルテクスチャラスタライゼーション

3D 対応の敵対的生成ネットワーク (GAN) は、単一ビューの 2D 画像のコレクションのみを使用して、高忠実度でマルチビューの一貫性のある顔画像を合成します。顔の属性をきめ細かく制御するために、最近の取り組みでは 3D Morphable Face Model (3DMM) を組み込んで、明示的または暗黙的に生成放射輝度フィールドの変形を記述しています。明示的な方法は、きめの細かい表現制御を提供しますが、髪やアクセサリーによって引き起こされるトポロジーの変化を処理できません。一方、暗黙的な方法は、さまざまなトポロジーをモデル化できますが、制約のない変形フィールドによって引き起こされる一般化が制限されます。非構造化 2D 画像から、生成的で高品質で 3D 一貫性のある顔アバターを教師なし学習するための新しい 3D GAN フレームワークを提案します。変形の精度とトポロジーの柔軟性の両方を実現するために、Generative Texture-Rasterized Tri-planes と呼ばれる 3D 表現を提案します。提案された表現は、パラメトリックメッシュテンプレートの上でジェネレーティブニューラルテクスチャを学習し、ラスター化によって 3 つの直交ビューフィーチャプレーンに投影し、ボリュームレンダリング用のトライプレーンフィーチャ表現を形成します。このようにして、メッシュガイドによる明示的な変形のきめの細かい表現制御と暗黙的なボリューム表現の柔軟性の両方を組み合わせます。さらに、3DMM では考慮されていない口の内部をモデリングするための特定のモジュールを提案します。私たちの方法は、広範な実験を通じて、最先端の 3D 認識合成品質とアニメーション能力を示しています。さらに、3D プリファードとして機能するアニメーション可能な 3D 表現は、ワンショットの顔アバターや 3D 対応のスタイル設定など、複数のアプリケーションを強化します。

3D-aware generative adversarial networks (GANs) synthesize high-fidelity and multi-view-consistent facial images using only collections of single-view 2D imagery. Towards fine-grained control over facial attributes, recent efforts incorporate 3D Morphable Face Model (3DMM) to describe deformation in generative radiance fields either explicitly or implicitly. Explicit methods provide fine-grained expression control but cannot handle topological changes caused by hair and accessories, while implicit ones can model varied topologies but have limited generalization caused by the unconstrained deformation fields. We propose a novel 3D GAN framework for unsupervised learning of generative, high-quality and 3D-consistent facial avatars from unstructured 2D images. To achieve both deformation accuracy and topological flexibility, we propose a 3D representation called Generative Texture-Rasterized Tri-planes. The proposed representation learns Generative Neural Textures on top of parametric mesh templates and then projects them into three orthogonal-viewed feature planes through rasterization, forming a tri-plane feature representation for volume rendering. In this way, we combine both fine-grained expression control of mesh-guided explicit deformation and the flexibility of implicit volumetric representation. We further propose specific modules for modeling mouth interior which is not taken into account by 3DMM. Our method demonstrates state-of-the-art 3D-aware synthesis quality and animation ability through extensive experiments. Furthermore, serving as 3D prior, our animatable 3D representation boosts multiple applications including one-shot facial avatars and 3D-aware stylization.

updated: Mon Mar 13 2023 17:08:01 GMT+0000 (UTC)

published: Mon Nov 21 2022 06:40:46 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト