OTAvatar: One-shot Talking Face Avatar with Controllable Tri-plane Rendering

Zhiyuan Ma; Xiangyu Zhu; Guojun Qi; Zhen Lei; Lei Zhang

OTAvatar: 制御可能なトライプレーンレンダリングを備えたワンショットの話す顔のアバター

制御可能性、一般化可能性、および効率性は、ニューラル暗黙フィールドによって表される顔アバターを構築する主な目的です。しかし、既存の方法では、3 つの要件を同時に満たすことはできませんでした。それらは、静的なポートレートに焦点を当てて表現能力を特定の主題に制限するか、かなりの計算コストに悩まされて柔軟性を制限します。この論文では、各パーソナライズされたアバターが参照として 1 つのポートレートのみから構築できるように、一般化された制御可能なトライプレーンレンダリングソリューションによって顔アバターを構築する One-shot Talking face Avatar (OTAvatar) を提案します。具体的には、OTAvatar はまずポートレート画像を動きのない ID コードに反転します。第２に、同一性コードおよび運動コードを利用して、効率的なＣＮＮを変調し、対象を所望の運動に符号化する三面定式化ボリュームを生成する。最後に、ボリュームレンダリングを使用して、任意のビューで画像を生成します。私たちのソリューションの中核は、最適化ベースの反転によって潜在コードのアイデンティティとモーションを解きほぐす、反転による新しいデカップリング戦略です。効率的なトライプレーン表現の恩恵を受けて、A100 で 35 FPS で一般化された顔アバターの制御可能なレンダリングを実現します。実験では、トレーニングセット外の被験者に対するクロスアイデンティティ再現の有望なパフォーマンスと、より優れた 3D 一貫性が示されています。

Controllability, generalizability and efficiency are the major objectives of constructing face avatars represented by neural implicit field. However, existing methods have not managed to accommodate the three requirements simultaneously. They either focus on static portraits, restricting the representation ability to a specific subject, or suffer from substantial computational cost, limiting their flexibility. In this paper, we propose One-shot Talking face Avatar (OTAvatar), which constructs face avatars by a generalized controllable tri-plane rendering solution so that each personalized avatar can be constructed from only one portrait as the reference. Specifically, OTAvatar first inverts a portrait image to a motion-free identity code. Second, the identity code and a motion code are utilized to modulate an efficient CNN to generate a tri-plane formulated volume, which encodes the subject in the desired motion. Finally, volume rendering is employed to generate an image in any view. The core of our solution is a novel decoupling-by-inverting strategy that disentangles identity and motion in the latent code via optimization-based inversion. Benefiting from the efficient tri-plane representation, we achieve controllable rendering of generalized face avatar at 35 FPS on A100. Experiments show promising performance of cross-identity reenactment on subjects out of the training set and better 3D consistency.

updated: Sun Mar 26 2023 09:12:03 GMT+0000 (UTC)

published: Sun Mar 26 2023 09:12:03 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト