Neural Dubber: Dubbing for Silent Videos According to Scripts

Chenxu Hu; Qiao Tian; Tingle Li; Yuping Wang; Yuxuan Wang; Hang Zhao

ニューラルダバー：スクリプトによるサイレントビデオの吹き替え

吹き替えは、俳優の会話を再録音するポストプロダクションプロセスであり、映画製作やビデオ制作で広く使用されています。これは通常、適切な韻律で行を読むプロの声優によって、事前に録画されたビデオと同期して手動で実行されます。この作業では、新しい自動ビデオ吹き替え（AVD）タスクを解決する最初のニューラルネットワークモデルであるNeural Dubberを提案します。これは、テキストから指定されたサイレントビデオと同期した人間の音声を合成することです。 Neural Dubberは、ビデオの唇の動きを利用して生成された音声の韻律を制御するマルチモーダルテキスト読み上げ（TTS）モデルです。さらに、マルチスピーカー設定用に画像ベースのスピーカー埋め込み（ISE）モジュールが開発されており、NeuralDubberがスピーカーの顔に応じて適度な音色で音声を生成できるようになっています。化学講義のシングルスピーカーデータセットとLRS2マルチスピーカーデータセットに関する実験は、NeuralDubberが音声品質の点で最先端のTTSモデルと同等の音声オーディオを生成できることを示しています。最も重要なことは、定性的評価と定量的評価の両方で、Neural Dubberがビデオによる合成音声の韻律を制御し、ビデオと時間的に同期した忠実度の高い音声を生成できることを示しています。

Dubbing is a post-production process of re-recording actors' dialogues, which is extensively used in filmmaking and video production. It is usually performed manually by professional voice actors who read lines with proper prosody, and in synchronization with the pre-recorded videos. In this work, we propose Neural Dubber, the first neural network model to solve a novel automatic video dubbing (AVD) task: synthesizing human speech synchronized with the given silent video from the text. Neural Dubber is a multi-modal text-to-speech (TTS) model that utilizes the lip movement in the video to control the prosody of the generated speech. Furthermore, an image-based speaker embedding (ISE) module is developed for the multi-speaker setting, which enables Neural Dubber to generate speech with a reasonable timbre according to the speaker's face. Experiments on the chemistry lecture single-speaker dataset and LRS2 multi-speaker dataset show that Neural Dubber can generate speech audios on par with state-of-the-art TTS models in terms of speech quality. Most importantly, both qualitative and quantitative evaluations show that Neural Dubber can control the prosody of synthesized speech by the video, and generate high-fidelity speech temporally synchronized with the video.

updated: Fri Oct 15 2021 17:56:07 GMT+0000 (UTC)

published: Fri Oct 15 2021 17:56:07 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト