Learning Dynamic Facial Radiance Fields for Few-Shot Talking Head Synthesis

Shuai Shen; Wanhua Li; Zheng Zhu; Yueqi Duan; Jie Zhou; Jiwen Lu

少数ショットのトーキングヘッド合成のための動的顔面放射フィールドの学習

トーキングヘッドシンセシスは、映画の吹き替え、仮想アバター、オンライン教育に幅広い用途を持つ新しいテクノロジーです。最近のNeRFベースの方法は、顔の3D構造情報をより適切にキャプチャするため、より自然な会話ビデオを生成します。ただし、大規模なデータセットを使用するIDごとに特定のモデルをトレーニングする必要があります。この論文では、少数ショットのトーキングヘッド合成のためのダイナミックフェイシャルラディアンスフィールド（DFRF）を提案します。これは、トレーニングデータがほとんどなくても目に見えないアイデンティティに迅速に一般化できます。特定の人物の3Dジオメトリと外観をネットワークに直接エンコードする既存のNeRFベースの方法とは異なり、DFRFは、2D外観画像の顔の輝度フィールドを条件付けして事前に顔を学習します。したがって、顔の輝きのフィールドは、いくつかの参照画像で新しいアイデンティティに柔軟に調整できます。さらに、顔の変形をより適切にモデル化するために、オーディオ信号を条件として、すべての参照画像をクエリ空間に変形する、差別化可能な顔ワーピングモジュールを提案します。広範な実験により、数十秒のトレーニングクリップが利用可能であるため、提案されたDFRFは、わずか40k回の反復で、自然で高品質のオーディオ駆動のトーキングヘッドビデオを新しいアイデンティティに合成できることが示されています。直感的な比較のために、補足ビデオをご覧になることを強くお勧めします。コードはhttps://sstzal.github.io/DFRF/で入手できます。

Talking head synthesis is an emerging technology with wide applications in film dubbing, virtual avatars and online education. Recent NeRF-based methods generate more natural talking videos, as they better capture the 3D structural information of faces. However, a specific model needs to be trained for each identity with a large dataset. In this paper, we propose Dynamic Facial Radiance Fields (DFRF) for few-shot talking head synthesis, which can rapidly generalize to an unseen identity with few training data. Different from the existing NeRF-based methods which directly encode the 3D geometry and appearance of a specific person into the network, our DFRF conditions face radiance field on 2D appearance images to learn the face prior. Thus the facial radiance field can be flexibly adjusted to the new identity with few reference images. Additionally, for better modeling of the facial deformations, we propose a differentiable face warping module conditioned on audio signals to deform all reference images to the query space. Extensive experiments show that with only tens of seconds of training clip available, our proposed DFRF can synthesize natural and high-quality audio-driven talking head videos for novel identities with only 40k iterations. We highly recommend readers view our supplementary video for intuitive comparisons. Code is available in https://sstzal.github.io/DFRF/.

updated: Sun Jul 24 2022 16:46:03 GMT+0000 (UTC)

published: Sun Jul 24 2022 16:46:03 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト