DialogueNeRF: Towards Realistic Avatar Face-to-face Conversation Video Generation

Zanwei Zhou; Zi Wang; Shunyu Yao; Yichao Yan; Chen Yang; Guangtao Zhai; Junchi Yan; Xiaokang Yang

DialogueNeRF：リアルなアバターの対面会話ビデオ生成に向けて

会話は、メタバースでの仮想アバターアクティビティの重要なコンポーネントです。自然言語処理の開発により、テキストおよび音声による会話の生成は大きな進歩を遂げました。対面での会話は、毎日の会話の大部分を占めています。ただし、このタスクは十分な注目を集めていません。この論文では、現実的な人間のアバターの対面会話プロセスを生成し、このターゲットを探索するための新しいデータセットを提示することを目的とした新しいタスクを提案します。この新しいタスクに取り組むために、音声、頭のポーズ、表情などの一連の会話信号を利用して、すべての対話者が同じネットワーク内でモデル化された、人間のアバター間の対面会話ビデオを合成する新しいフレームワークを提案します。。私たちの方法は、画質、ポーズシーケンスの傾向、レンダリングビデオの自然さなど、さまざまな側面での定量的および定性的な実験によって評価されます。すべてのコード、データ、およびモデルが公開されます。

Conversation is an essential component of virtual avatar activities in the metaverse. With the development of natural language processing, textual and vocal conversation generation has achieved a significant breakthrough. Face-to-face conversations account for the vast majority of daily conversations. However, this task has not acquired enough attention. In this paper, we propose a novel task that aims to generate a realistic human avatar face-to-face conversation process and present a new dataset to explore this target. To tackle this novel task, we propose a new framework that utilizes a series of conversation signals, e.g. audio, head pose, and expression, to synthesize face-to-face conversation videos between human avatars, with all the interlocutors modeled within the same network. Our method is evaluated by quantitative and qualitative experiments in different aspects, e.g. image quality, pose sequence trend, and naturalness of the rendering videos. All the code, data, and models will be made publicly available.

updated: Tue Mar 15 2022 14:16:49 GMT+0000 (UTC)

published: Tue Mar 15 2022 14:16:49 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト