Language-Guided Face Animation by Recurrent StyleGAN-based Generator

Tiankai Hang; Huan Yang; Bei Liu; Jianlong Fu; Xin Geng; Baining Guo

Recurrent StyleGAN ベースのジェネレーターによる言語ガイド付き顔アニメーション

言語に誘導された画像操作に関する最近の研究では、特に顔画像に対して、豊富なセマンティクスを提供する言語の大きな力が示されています。しかし、言語におけるその他の自然情報である動作は、あまり調査されていません。この論文では、モーション情報を活用し、言語の助けを借りて静的な顔画像をアニメーション化することを目的とした新しいタスクである言語ガイド付き顔アニメーションを研究します。言語のセマンティクスとモーションの両方をより有効に活用するために、シンプルでありながら効果的なフレームワークを提案します。具体的には、言語から一連のセマンティックおよびモーション情報を抽出し、それを視覚情報とともに事前トレーニング済みの StyleGAN に供給して、高品質のフレームを生成するリカレントモーションジェネレーターを提案します。提案されたフレームワークを最適化するために、顔のアイデンティティを維持するための正則化損失、動きの滑らかさを確保するための経路長の正則化損失、および単一のモデルでさまざまな言語ガイダンスを使用したビデオ合成を可能にする対照的な損失を含む、3 つの慎重に設計された損失関数が提案されています。さまざまなドメイン (人間の顔、アニメの顔、犬の顔など) での定性的評価と定量的評価の両方を含む広範な実験により、言語のガイダンスにより 1 つの静止画像から高品質でリアルなビデオを生成するモデルの優位性が実証されました。コードは https://github.com/TiankaiHang/language-guided-animation.git で入手できます。

Recent works on language-guided image manipulation have shown great power of language in providing rich semantics, especially for face images. However, the other natural information, motions, in language is less explored. In this paper, we leverage the motion information and study a novel task, language-guided face animation, that aims to animate a static face image with the help of languages. To better utilize both semantics and motions from languages, we propose a simple yet effective framework. Specifically, we propose a recurrent motion generator to extract a series of semantic and motion information from the language and feed it along with visual information to a pre-trained StyleGAN to generate high-quality frames. To optimize the proposed framework, three carefully designed loss functions are proposed including a regularization loss to keep the face identity, a path length regularization loss to ensure motion smoothness, and a contrastive loss to enable video synthesis with various language guidance in one single model. Extensive experiments with both qualitative and quantitative evaluations on diverse domains (e.g., human face, anime face, and dog face) demonstrate the superiority of our model in generating high-quality and realistic videos from one still image with the guidance of language. Code will be available at https://github.com/TiankaiHang/language-guided-animation.git.

updated: Wed Jul 03 2024 06:50:39 GMT+0000 (UTC)

published: Thu Aug 11 2022 02:57:30 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト