TeCH: Text-guided Reconstruction of Lifelike Clothed Humans

Yangyi Huang; Hongwei Yi; Yuliang Xiu; Tingting Liao; Jiaxiang Tang; Deng Cai; Justus Thies

TeCH: 生きているような衣服を着た人間のテキストガイドによる再構築

単一の画像から衣服を着た人間を再構築する最近の研究の進歩にも関わらず、高レベルの詳細を含む「見えない領域」を正確に復元することは、注目を欠いた未解決の課題のままです。既存の方法では、テクスチャがぼやけた過度に滑らかな背面表面が生成されることがよくあります。しかし、目に見えない領域 (後ろ姿など) を再構成するのに十分な、個人のすべての視覚的属性を 1 つの画像から効果的にキャプチャするにはどうすればよいでしょうか? TeCH は基礎モデルの力を利用して、1) 衣服解析モデルと視覚的質問応答 (VQA) によって自動的に生成される説明テキストプロンプト (衣服、色、ヘアスタイルなど)、2) パーソナライズされたメッセージを活用して 3D 人間を再構築します。「言葉では言い表せない」外観を学習する、微調整された Text-to-Image 拡散モデル (T2I)。高解像度の 3D 衣服を着た人間を手頃なコストで表現するために、明示的な身体形状グリッドと暗黙的な距離フィールドで構成される DMTet に基づくハイブリッド 3D 表現を提案します。説明的なプロンプトとパーソナライズされた T2I 拡散モデルによってガイドされ、3D 人間のジオメトリとテクスチャは、マルチビューのスコア蒸留サンプリング (SDS) と元の観察に基づく再構成損失を通じて最適化されます。 TeCH は、一貫した繊細なテクスチャーと詳細な全身ジオメトリを備えた、高忠実度の 3D 服を着た人間を作成します。定量的および定性的な実験により、TeCH が再構成精度とレンダリング品質の点で最先端の方法よりも優れていることが実証されています。コードは研究目的で https://huangyangyi.github.io/TeCH で公開されます。

Despite recent research advancements in reconstructing clothed humans from a single image, accurately restoring the "unseen regions" with high-level details remains an unsolved challenge that lacks attention. Existing methods often generate overly smooth back-side surfaces with a blurry texture. But how to effectively capture all visual attributes of an individual from a single image, which are sufficient to reconstruct unseen areas (e.g., the back view)? Motivated by the power of foundation models, TeCH reconstructs the 3D human by leveraging 1) descriptive text prompts (e.g., garments, colors, hairstyles) which are automatically generated via a garment parsing model and Visual Question Answering (VQA), 2) a personalized fine-tuned Text-to-Image diffusion model (T2I) which learns the "indescribable" appearance. To represent high-resolution 3D clothed humans at an affordable cost, we propose a hybrid 3D representation based on DMTet, which consists of an explicit body shape grid and an implicit distance field. Guided by the descriptive prompts + personalized T2I diffusion model, the geometry and texture of the 3D humans are optimized through multi-view Score Distillation Sampling (SDS) and reconstruction losses based on the original observation. TeCH produces high-fidelity 3D clothed humans with consistent & delicate texture, and detailed full-body geometry. Quantitative and qualitative experiments demonstrate that TeCH outperforms the state-of-the-art methods in terms of reconstruction accuracy and rendering quality. The code will be publicly available for research purposes at https://huangyangyi.github.io/TeCH

updated: Sat Aug 19 2023 20:08:54 GMT+0000 (UTC)

published: Wed Aug 16 2023 17:59:13 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト