Synthesizing Photorealistic Virtual Humans Through Cross-modal Disentanglement

Siddarth Ravichandran; Ondřej Texler; Dimitar Dinev; Hyun Jae Kang

クロスモーダル分離によるフォトリアリスティックなバーチャルヒューマンの合成

過去数十年にわたって、Amazon の Alexa や Apple の Siri などのデジタルアシスタントの登場から、ブランド変更された Meta の最新のメタバースの取り組みまで、人間の生活の多くの側面が仮想ドメインによって強化されてきました。これらの傾向は、人間の写実的な視覚的描写を生成することの重要性を強調しています。これにより、近年、いわゆるディープフェイクおよびトーキングヘッド生成手法が急速に成長しています。印象的な結果と人気にもかかわらず、通常、テクスチャの品質、唇の同期、解像度などの特定の定性的な側面と、リアルタイムで実行する機能などの実用的な側面が欠けています。仮想人間のアバターを実用的なシナリオで使用できるようにするために、パフォーマンスに特に重点を置いてスピーチが可能な高品質の仮想人間の顔を合成するためのエンドツーエンドのフレームワークを提案します。口形素を中間オーディオ表現として利用する新しいネットワークと、グローバルな頭の動きを制御するために使用されるさまざまなモダリティのもつれを解くことを可能にする階層的な画像合成アプローチを採用する新しいデータ拡張戦略を紹介します。私たちの方法はリアルタイムで実行され、現在の最先端技術と比較して優れた結果を提供できます。

Over the last few decades, many aspects of human life have been enhanced with virtual domains, from the advent of digital assistants such as Amazon's Alexa and Apple's Siri to the latest metaverse efforts of the rebranded Meta. These trends underscore the importance of generating photorealistic visual depictions of humans. This has led to the rapid growth of so-called deepfake and talking head generation methods in recent years. Despite their impressive results and popularity, they usually lack certain qualitative aspects such as texture quality, lips synchronization, or resolution, and practical aspects such as the ability to run in real-time. To allow for virtual human avatars to be used in practical scenarios, we propose an end-to-end framework for synthesizing high-quality virtual human faces capable of speech with a special emphasis on performance. We introduce a novel network utilizing visemes as an intermediate audio representation and a novel data augmentation strategy employing a hierarchical image synthesis approach that allows disentanglement of the different modalities used to control the global head motion. Our method runs in real-time, and is able to deliver superior results compared to the current state-of-the-art.

updated: Sat Sep 03 2022 03:56:49 GMT+0000 (UTC)

published: Sat Sep 03 2022 03:56:49 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト