OmniAvatar: Geometry-Guided Controllable 3D Head Synthesis

Hongyi Xu; Guoxian Song; Zihang Jiang; Jianfeng Zhang; Yichun Shi; Jing Liu; Wanchun Ma; Jiashi Feng; Linjie Luo

OmniAvatar: ジオメトリに基づいて制御可能な 3D ヘッド合成

OmniAvatar は、カメラのポーズ、顔の表情、頭の形を完全に解きほぐした制御の下で、説得力のある動的な詳細を備えた多様なアイデンティティ保存 3D 頭を合成できる、野生の非構造化画像からトレーニングされた、新しいジオメトリガイド付き 3D 頭合成モデルです。、関節のある首と顎のポーズ。このような高レベルのもつれのない制御を実現するために、最初に、制御パラメーターで条件付けられたヘッドジオメトリ (FLAME) の周りに新しいセマンティック符号付き距離関数 (SDF) を明示的に定義します。このセマンティック SDF により、観測空間からすべての制御パラメーターから解きほぐされた正準空間への微分可能な体積対応マップを構築できます。次に、3D 認識 GAN フレームワーク (EG3D) を活用して、正準空間で 3D フルヘッドの詳細な形状と外観を合成し、続いて、ボリューム対応マップによって導かれるボリュームレンダリングステップを実行して、観測空間に出力します。合成された頭部形状と表情の制御精度を確保するために、頭部 SDF に準拠するジオメトリ事前損失と、表現コードに準拠する制御損失を導入します。さらに、さまざまな表情や関節のポーズを条件としたダイナミックなディテールで、時間のリアリズムを高めます。私たちのモデルは、質的にも量的にも最先端の方法と比較して、説得力のある動的な詳細を備えた、より好ましい同一性が保存された3Dヘッドを合成できます。また、システム設計の選択肢の多くを正当化するためにアブレーション研究も提供しています。

We present OmniAvatar, a novel geometry-guided 3D head synthesis model trained from in-the-wild unstructured images that is capable of synthesizing diverse identity-preserved 3D heads with compelling dynamic details under full disentangled control over camera poses, facial expressions, head shapes, articulated neck and jaw poses. To achieve such high level of disentangled control, we first explicitly define a novel semantic signed distance function (SDF) around a head geometry (FLAME) conditioned on the control parameters. This semantic SDF allows us to build a differentiable volumetric correspondence map from the observation space to a disentangled canonical space from all the control parameters. We then leverage the 3D-aware GAN framework (EG3D) to synthesize detailed shape and appearance of 3D full heads in the canonical space, followed by a volume rendering step guided by the volumetric correspondence map to output into the observation space. To ensure the control accuracy on the synthesized head shapes and expressions, we introduce a geometry prior loss to conform to head SDF and a control loss to conform to the expression code. Further, we enhance the temporal realism with dynamic details conditioned upon varying expressions and joint poses. Our model can synthesize more preferable identity-preserved 3D heads with compelling dynamic details compared to the state-of-the-art methods both qualitatively and quantitatively. We also provide an ablation study to justify many of our system design choices.

updated: Mon Mar 27 2023 18:36:53 GMT+0000 (UTC)

published: Mon Mar 27 2023 18:36:53 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト