Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation

Yuanxun Lu; Jinxiang Chai; Xun Cao

ライブスピーチポートレート：リアルタイムの写実的な会話-ヘッドアニメーション

私たちの知る限りでは、最初に、30fpsを超えるオーディオ信号によってのみ駆動されるパーソナライズされたフォトリアリスティックなトーキングヘッドアニメーションを生成するライブシステムを紹介します。私たちのシステムには3つの段階があります。最初の段階は、深い音声の特徴を多様体の投影とともに抽出して、その特徴を対象者の音声空間に投影する深いニューラルネットワークです。第2段階では、投影されたオーディオ機能から顔のダイナミクスと動きを学習します。予測される動きには、頭のポーズと上半身の動きが含まれ、前者は、対象者の頭のポーズ分布をモデル化する自己回帰確率モデルによって生成されます。上半身の動きは頭のポーズから推測されます。最終段階では、以前の予測から条件付き特徴マップを生成し、候補画像セットとともに画像から画像への変換ネットワークに送信して、フォトリアリスティックなレンダリングを合成します。私たちの方法は、ワイルドオーディオにうまく一般化され、しわ、歯など、忠実度の高いパーソナライズされた顔の詳細を正常に合成します。私たちの方法では、頭のポーズを明示的に制御することもできます。広範な定性的および定量的評価は、ユーザーの研究とともに、最先端の技術に対する私たちの方法の優位性を示しています。

To the best of our knowledge, we first present a live system that generates personalized photorealistic talking-head animation only driven by audio signals at over 30 fps. Our system contains three stages. The first stage is a deep neural network that extracts deep audio features along with a manifold projection to project the features to the target person's speech space. In the second stage, we learn facial dynamics and motions from the projected audio features. The predicted motions include head poses and upper body motions, where the former is generated by an autoregressive probabilistic model which models the head pose distribution of the target person. Upper body motions are deduced from head poses. In the final stage, we generate conditional feature maps from previous predictions and send them with a candidate image set to an image-to-image translation network to synthesize photorealistic renderings. Our method generalizes well to wild audio and successfully synthesizes high-fidelity personalized facial details, e.g., wrinkles, teeth. Our method also allows explicit control of head poses. Extensive qualitative and quantitative evaluations, along with user studies, demonstrate the superiority of our method over state-of-the-art techniques.

updated: Fri Sep 24 2021 06:26:47 GMT+0000 (UTC)

published: Wed Sep 22 2021 08:47:43 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト