Learning Audio-Driven Viseme Dynamics for 3D Face Animation

Linchao Bao; Haoxian Zhang; Yue Qian; Tangli Xue; Changhai Chen; Xuefei Zhe; Di Kang

3D フェイスアニメーション用のオーディオ主導の口形素ダイナミクスの学習

入力オーディオからリアルなリップ同期 3D フェイシャルアニメーションを生成できる、新しいオーディオ主導のフェイシャルアニメーションアプローチを紹介します。私たちのアプローチは、音声ビデオから口形素ダイナミクスを学習し、アニメーターに適した口形素曲線を生成し、多言語の音声入力をサポートします。私たちのアプローチの核となるのは、音声ビデオから口形素パラメータを抽出するために事前音素を利用する新しいパラメトリック口形素フィッティングアルゴリズムです。音素のガイダンスにより、抽出された口形素曲線は音素とより適切に相関することができるため、アニメーターにとってより制御しやすく、親しみやすいものになります。多言語の音声入力と目に見えない声への一般化をサポートするために、複数の言語で事前トレーニングされた深い音声特徴モデルを利用して、音声から口形素曲線へのマッピングを学習します。オーディオからカーブへのマッピングは、入力オーディオに音量、ピッチ、速度、またはノイズの歪みがある場合でも、最先端のパフォーマンスを実現します。最後に、効率的な音声アニメーション制作のために、忠実度の高い口形素アセットを取得するための口形素スキャンアプローチを紹介します。予測された口形素曲線をさまざまな口形素が装備されたキャラクターに適用して、現実的で自然な顔の動きを備えたさまざまなパーソナライズされたアニメーションを生成できることを示します。私たちのアプローチはアーティストにとって使いやすく、blendshape やボーンベースのアニメーションなど、一般的なアニメーション制作ワークフローに簡単に統合できます。

We present a novel audio-driven facial animation approach that can generate realistic lip-synchronized 3D facial animations from the input audio. Our approach learns viseme dynamics from speech videos, produces animator-friendly viseme curves, and supports multilingual speech inputs. The core of our approach is a novel parametric viseme fitting algorithm that utilizes phoneme priors to extract viseme parameters from speech videos. With the guidance of phonemes, the extracted viseme curves can better correlate with phonemes, thus more controllable and friendly to animators. To support multilingual speech inputs and generalizability to unseen voices, we take advantage of deep audio feature models pretrained on multiple languages to learn the mapping from audio to viseme curves. Our audio-to-curves mapping achieves state-of-the-art performance even when the input audio suffers from distortions of volume, pitch, speed, or noise. Lastly, a viseme scanning approach for acquiring high-fidelity viseme assets is presented for efficient speech animation production. We show that the predicted viseme curves can be applied to different viseme-rigged characters to yield various personalized animations with realistic and natural facial motions. Our approach is artist-friendly and can be easily integrated into typical animation production workflows including blendshape or bone based animation.

updated: Sun Jan 15 2023 09:55:46 GMT+0000 (UTC)

published: Sun Jan 15 2023 09:55:46 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト