3D-TalkEmo: Learning to Synthesize 3D Emotional Talking Head

Qianyun Wang; Zhenfeng Fan; Shihong Xia

3D-TalkEmo：3D感情的なトーキングヘッドの合成を学ぶ

最近、オーディオ駆動の3D顔のアニメーションで目覚ましい進歩が見られましたが、豊かな感情を持った3Dトーキングヘッドの合成はまだ解決されていません。これは、3D生成モデルと、同期されたオーディオを備えた利用可能な3D感情データセットが不足しているためです。これに対処するために、さまざまな感情を伴う3Dトーキングヘッドアニメーションを生成するディープニューラルネットワークである3D-TalkEmoを紹介します。また、同期されたオーディオとビデオ、豊富なコーパス、および高度な3D顔再構成方法を使用して、さまざまな人のさまざまな感情状態を含む大規模な3Dデータセットを作成します。感情生成ネットワークでは、新しい3D顔表現構造（古典的な多次元尺度構成法によるジオメトリマップ）を提案します。最小二乗の意味で頂点間の地理的距離メトリックを維持しながら、3D面上の頂点の座標を正規の画像平面にマッピングします。これにより、各頂点の隣接関係が維持され、3D顔面の効果的な畳み込み構造が保持されます。ニュートラルな3Dメッシュと音声信号を入力として使用する3D-TalkEmoは、鮮やかな顔のアニメーションを生成できます。さらに、アニメーションスピーカーの感情状態を変更するためのアクセスを提供します。ユーザー調査に加えて、私たちの方法の広範な定量的および定性的評価を提示し、以前の最先端の方法と比較して大幅に高品質の生成されたトーキングヘッズを示します。

Impressive progress has been made in audio-driven 3D facial animation recently, but synthesizing 3D talking-head with rich emotion is still unsolved. This is due to the lack of 3D generative models and available 3D emotional dataset with synchronized audios. To address this, we introduce 3D-TalkEmo, a deep neural network that generates 3D talking head animation with various emotions. We also create a large 3D dataset with synchronized audios and videos, rich corpus, as well as various emotion states of different persons with the sophisticated 3D face reconstruction methods. In the emotion generation network, we propose a novel 3D face representation structure - geometry map by classical multi-dimensional scaling analysis. It maps the coordinates of vertices on a 3D face to a canonical image plane, while preserving the vertex-to-vertex geodesic distance metric in a least-square sense. This maintains the adjacency relationship of each vertex and holds the effective convolutional structure for the 3D facial surface. Taking a neutral 3D mesh and a speech signal as inputs, the 3D-TalkEmo is able to generate vivid facial animations. Moreover, it provides access to change the emotion state of the animated speaker. We present extensive quantitative and qualitative evaluation of our method, in addition to user studies, demonstrating the generated talking-heads of significantly higher quality compared to previous state-of-the-art methods.

updated: Sun Apr 25 2021 02:48:19 GMT+0000 (UTC)

published: Sun Apr 25 2021 02:48:19 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト