AvatarBooth: High-Quality and Customizable 3D Human Avatar Generation

Yifei Zeng; Yuanxun Lu; Xinya Ji; Yao Yao; Hao Zhu; Xun Cao

AvatarBooth: 高品質でカスタマイズ可能な 3D ヒューマンアバターの生成

テキストプロンプトまたは特定の画像を使用して高品質の 3D アバターを生成する新しい方法である AvatarBooth を紹介します。単純なテキストの説明に基づいてのみアバターを合成できる以前のアプローチとは異なり、私たちの方法では、テキストベースのモデルの生成と編集をサポートしながら、何気なく撮影した顔や体の画像からパーソナライズされたアバターを作成できます。私たちの主な貢献は、人間の顔と体に別々に微調整されたデュアル拡散モデルを使用することによる正確なアバター生成制御です。これにより、顔の外観、服装、アクセサリーの複雑な詳細をキャプチャできるようになり、非常にリアルなアバターが生成されます。さらに、最適化プロセスにポーズ一貫性制約を導入して、拡散モデルから合成された頭部画像の多視点一貫性を高め、制御されていない人間のポーズによる干渉を排除します。さらに、3D アバター生成の粗いから細かいまでの監視を容易にし、それによって提案されたシステムのパフォーマンスを向上させる多重解像度レンダリング戦略を提案します。結果として得られるアバターモデルは、追加のテキスト説明を使用してさらに編集し、モーションシーケンスによって駆動することができます。実験の結果、AvatarBooth は、テキストプロンプトまたは特定の画像のレンダリングと幾何学的品質の点で、以前のテキストから 3D への方法よりも優れていることがわかりました。プロジェクトの Web サイト (https://zeng-yifei.github.io/avatarbooth_page/) をご確認ください。

We introduce AvatarBooth, a novel method for generating high-quality 3D avatars using text prompts or specific images. Unlike previous approaches that can only synthesize avatars based on simple text descriptions, our method enables the creation of personalized avatars from casually captured face or body images, while still supporting text-based model generation and editing. Our key contribution is the precise avatar generation control by using dual fine-tuned diffusion models separately for the human face and body. This enables us to capture intricate details of facial appearance, clothing, and accessories, resulting in highly realistic avatar generations. Furthermore, we introduce pose-consistent constraint to the optimization process to enhance the multi-view consistency of synthesized head images from the diffusion model and thus eliminate interference from uncontrolled human poses. In addition, we present a multi-resolution rendering strategy that facilitates coarse-to-fine supervision of 3D avatar generation, thereby enhancing the performance of the proposed system. The resulting avatar model can be further edited using additional text descriptions and driven by motion sequences. Experiments show that AvatarBooth outperforms previous text-to-3D methods in terms of rendering and geometric quality from either text prompts or specific images. Please check our project website at https://zeng-yifei.github.io/avatarbooth_page/.

updated: Fri Jun 16 2023 14:18:51 GMT+0000 (UTC)

published: Fri Jun 16 2023 14:18:51 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト