LipSync3D: Data-Efficient Learning of Personalized 3D Talking Faces from Video using Pose and Lighting Normalization

Avisek Lahiri; Vivek Kwatra; Christian Frueh; John Lewis; Chris Bregler

LipSync3D: ポーズと照明の正規化を使用したビデオからのパーソナライズされた 3D 話し顔のデータ効率の高い学習

この論文では、オーディオからパーソナライズされた 3D 話し顔をアニメーション化するためのビデオベースの学習フレームワークを紹介します。データサンプルの効率を大幅に向上させる 2 つのトレーニング時のデータ正規化を導入します。まず、3D ジオメトリ、頭のポーズ、テクスチャを分離する正規化された空間で顔を分離して表現します。これにより、予測問題が 3D 顔の形状と対応する 2D テクスチャアトラスの回帰に分解されます。第二に、顔の対称性と皮膚のおおよそのアルベドの恒常性を活用して、時空間照明の変化を分離して除去します。一緒に、これらの正規化により、単純なネットワークは、単一のスピーカー固有のビデオでトレーニングしながら、新しい環境照明の下で忠実度の高いリップシンクビデオを生成できます。さらに、時間ダイナミクスを安定させるために、モデルを以前の視覚状態で条件付ける自己回帰アプローチを導入します。人間の評価と客観的な指標は、私たちの方法が、リアリズム、リップシンク、視覚的品質スコアの点で、現代の最先端のオーディオ主導のビデオ再現ベンチマークよりも優れていることを示しています。私たちのフレームワークによって可能になるいくつかのアプリケーションを示します。

In this paper, we present a video-based learning framework for animating personalized 3D talking faces from audio. We introduce two training-time data normalizations that significantly improve data sample efficiency. First, we isolate and represent faces in a normalized space that decouples 3D geometry, head pose, and texture. This decomposes the prediction problem into regressions over the 3D face shape and the corresponding 2D texture atlas. Second, we leverage facial symmetry and approximate albedo constancy of skin to isolate and remove spatio-temporal lighting variations. Together, these normalizations allow simple networks to generate high fidelity lip-sync videos under novel ambient illumination while training with just a single speaker-specific video. Further, to stabilize temporal dynamics, we introduce an auto-regressive approach that conditions the model on its previous visual state. Human ratings and objective metrics demonstrate that our method outperforms contemporary state-of-the-art audio-driven video reenactment benchmarks in terms of realism, lip-sync and visual quality scores. We illustrate several applications enabled by our framework.

updated: Tue Jun 08 2021 08:56:40 GMT+0000 (UTC)

published: Tue Jun 08 2021 08:56:40 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト