Talking Face Generation with Multilingual TTS

Hyoung-Kyu Song; Sang Hoon Woo; Junhyeok Lee; Seungmin Yang; Hyunjae Cho; Youseong Lee; Dongho Choi; Kang-wook Kim

多言語TTSを使用したトーキングフェイス生成

本研究では、会話顔生成システムと、テキスト入力のみで多言語会話顔ビデオを生成できる音声合成システムを組み合わせた共同システムを提案します。私たちのシステムは、話者の声のアイデンティティを維持しながら、自然な多言語音声を合成することができます。また、合成された音声に同期した唇の動きも合成できます。異なる言語族からそれぞれ4つの言語（韓国語、英語、日本語、中国語）を選択することにより、システムの一般化機能を示します。また、話す顔の生成モデルの出力を、多言語サポートを主張する以前の作業の出力と比較します。このデモでは、翻訳APIを前処理段階に追加し、ニューラルダバーの形式で提示して、ユーザーがシステムの多言語プロパティをより簡単に利用できるようにします。

In this work, we propose a joint system combining a talking face generation system with a text-to-speech system that can generate multilingual talking face videos from only the text input. Our system can synthesize natural multilingual speeches while maintaining the vocal identity of the speaker, as well as lip movements synchronized to the synthesized speech. We demonstrate the generalization capabilities of our system by selecting four languages (Korean, English, Japanese, and Chinese) each from a different language family. We also compare the outputs of our talking face generation model to outputs of a prior work that claims multilingual support. For our demo, we add a translation API to the preprocessing stage and present it in the form of a neural dubber so that users can utilize the multilingual property of our system more easily.

updated: Fri May 13 2022 02:08:35 GMT+0000 (UTC)

published: Fri May 13 2022 02:08:35 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト