Video-driven Neural Physically-based Facial Asset for Production

Longwen Zhang; Chuxiao Zeng; Qixuan Zhang; Hongyang Lin; Ruixiang Cao; Wei Yang; Lan Xu; Jingyi Yu

プロダクション用のビデオ主導のニューラル物理ベースのフェイシャルアセット

説得力のある 3D のダイナミックな人間の顔を作成するためのプロダクションレベルのワークフローは、長い間、ジオメトリとテクスチャの生成、モーションキャプチャとリギング、および表現の合成のためのさまざまな労力のかかるツールに依存してきました。最近のニューラルアプローチは個々のコンポーネントを自動化しますが、対応する潜在表現は、従来のツールのようにアーティストに明示的な制御を提供できません。このホワイトペーパーでは、高品質の物理ベースのアセットを使用して動的な顔のジオメトリを生成するための、新しい学習ベースのビデオ主導のアプローチを紹介します。データ収集のために、超高速ビデオカメラと組み合わせて生の 3D 顔アセットを取得する、ハイブリッドマルチビューフォトメトリックキャプチャステージを構築します。次に、個別の VAE を使用して顔の表情、ジオメトリ、および物理ベースのテクスチャをモデル化することに着手し、それぞれのネットワークの潜在空間全体にグローバル MLP ベースの表現マッピングを課して、それぞれの属性全体で特性を保持します。また、デルタ情報を物理ベースのテクスチャのリンクルマップとしてモデル化し、高品質の 4K ダイナミックテクスチャを実現します。忠実度の高いパフォーマー固有のフェイシャルキャプチャとクロスアイデンティティのフェイシャルモーションリターゲティングにおけるアプローチを示します。さらに、当社のマルチ VAE ベースのニューラルアセットは、高速適応スキームと共に展開して、野生のビデオを処理することもできます。さらに、さまざまな有望な物理ベースの編集結果を高いリアリズムで提供することにより、明示的な顔のほぐし戦略の有用性を高めます。包括的な実験は、私たちの技術が以前のビデオ主導の顔の再構成およびアニメーション方法よりも高い精度と視覚的忠実度を提供することを示しています。

Production-level workflows for producing convincing 3D dynamic human faces have long relied on an assortment of labor-intensive tools for geometry and texture generation, motion capture and rigging, and expression synthesis. Recent neural approaches automate individual components but the corresponding latent representations cannot provide artists with explicit controls as in conventional tools. In this paper, we present a new learning-based, video-driven approach for generating dynamic facial geometries with high-quality physically-based assets. For data collection, we construct a hybrid multiview-photometric capture stage, coupling with ultra-fast video cameras to obtain raw 3D facial assets. We then set out to model the facial expression, geometry and physically-based textures using separate VAEs where we impose a global MLP based expression mapping across the latent spaces of respective networks, to preserve characteristics across respective attributes. We also model the delta information as wrinkle maps for the physically-based textures, achieving high-quality 4K dynamic textures. We demonstrate our approach in high-fidelity performer-specific facial capture and cross-identity facial motion retargeting. In addition, our multi-VAE-based neural asset, along with the fast adaptation schemes, can also be deployed to handle in-the-wild videos. Besides, we motivate the utility of our explicit facial disentangling strategy by providing various promising physically-based editing results with high realism. Comprehensive experiments show that our technique provides higher accuracy and visual fidelity than previous video-driven facial reconstruction and animation methods.

updated: Fri Sep 16 2022 07:26:39 GMT+0000 (UTC)

published: Fri Feb 11 2022 13:22:48 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト