One-Shot High-Fidelity Talking-Head Synthesis with Deformable Neural Radiance Field

Weichuang Li; Longhao Zhang; Dong Wang; Bin Zhao; Zhigang Wang; Mulin Chen; Bang Zhang; Zhongjian Wang; Liefeng Bo; Xuelong Li

変形可能なニューラルラディアンスフィールドを使用したワンショットの高忠実度トーキングヘッド合成

しゃべる頭の生成は、ソース画像の識別情報を維持し、駆動画像の動きを模倣する顔を生成することを目的としています。ほとんどの先駆的な方法は、主に 2D 表現に依存しているため、頭が大きく回転すると必然的に顔の歪みが発生します。最近の作品では、代わりに明示的な 3D 構造表現または暗黙的なニューラルレンダリングを採用して、大きなポーズ変更下でのパフォーマンスを向上させています。それにもかかわらず、同一性と表現の忠実度は、特に新規ビューの合成では、それほど望ましくありません。この論文では、高忠実度とフリービューのトーキングヘッド合成を実現する HiDe-NeRF を提案します。最近提案された Deformable Neural Radiance Fields を利用して、HiDe-NeRF は 3D 動的シーンを正規の外観フィールドと暗黙の変形フィールドに表現します。前者は正規のソース面を構成し、後者はドライビングポーズと表情をモデル化します。特に、次の 2 つの側面から忠実度を向上させます。(i) アイデンティティの表現力を高めるために、マルチスケールボリューム機能を活用して顔の形状と詳細を保持する一般化された外観モジュールを設計します。 (ii) 表情の精度を向上させるために、ポーズと表情を明示的に切り離して正確な表情のモデリングを可能にする軽量の変形モジュールを提案します。広範な実験により、提案されたアプローチが以前の研究よりも優れた結果を生成できることが示されています。プロジェクトページ：https://www.waytron.net/hidenerf/

Talking head generation aims to generate faces that maintain the identity information of the source image and imitate the motion of the driving image. Most pioneering methods rely primarily on 2D representations and thus will inevitably suffer from face distortion when large head rotations are encountered. Recent works instead employ explicit 3D structural representations or implicit neural rendering to improve performance under large pose changes. Nevertheless, the fidelity of identity and expression is not so desirable, especially for novel-view synthesis. In this paper, we propose HiDe-NeRF, which achieves high-fidelity and free-view talking-head synthesis. Drawing on the recently proposed Deformable Neural Radiance Fields, HiDe-NeRF represents the 3D dynamic scene into a canonical appearance field and an implicit deformation field, where the former comprises the canonical source face and the latter models the driving pose and expression. In particular, we improve fidelity from two aspects: (i) to enhance identity expressiveness, we design a generalized appearance module that leverages multi-scale volume features to preserve face shape and details; (ii) to improve expression preciseness, we propose a lightweight deformation module that explicitly decouples the pose and expression to enable precise expression modeling. Extensive experiments demonstrate that our proposed approach can generate better results than previous works. Project page: https://www.waytron.net/hidenerf/

updated: Tue Apr 11 2023 09:47:35 GMT+0000 (UTC)

published: Tue Apr 11 2023 09:47:35 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト