SCULPT: Shape-Conditioned Unpaired Learning of Pose-dependent Clothed and Textured Human Meshes

Soubhik Sanyal; Partha Ghosh; Jinlong Yang; Michael J. Black; Justus Thies; Timo Bolkart

SCULPT: ポーズ依存の着衣およびテクスチャ人間メッシュの形状条件付き不対学習

私たちは、衣服を着た人間のテクスチャ付き 3D メッシュのための新しい 3D 生成モデルである SCULPT を紹介します。具体的には、衣服を着た人体の形状と外観の分布を表現することを学習するディープニューラルネットワークを考案します。人間用のテクスチャ付き 3D メッシュのデータセットはサイズとアクセシビリティに制限があるため、このようなモデルをトレーニングするのは困難です。私たちの重要な観察は、CAPE のような中規模の 3D スキャンデータセットと、衣服を着た人間の大規模な 2D 画像データセットが存在し、複数の外観を 1 つのジオメトリにマッピングできるということです。 2 つのデータモダリティから効果的に学習するために、ポーズ依存の衣服を着たメッシュとテクスチャ付きの人間のメッシュに対するペアのない学習手順を提案します。具体的には、3D スキャンデータから姿勢依存のジオメトリ空間を学習します。これを SMPL モデルの頂点変位ごとに表します。次に、2D 画像データを使用して、教師なしの方法でジオメトリ条件付きテクスチャジェネレーターをトレーニングします。学習したジオメトリモデルの中間アクティベーションを使用して、テクスチャジェネレーターを調整します。ポーズと衣服のタイプ、およびポーズと衣服の外観の間の絡み合いを軽減するために、ジオメトリの衣服タイプやテクスチャジェネレータの衣服の色などの属性ラベルを使用して、テクスチャジェネレータとジオメトリジェネレータの両方を条件付けします。視覚的質問応答モデル BLIP および CLIP に基づいて、2D 画像のこれらの条件付けラベルを自動的に生成しました。 SCULPT データセットでメソッドを検証し、衣服を着た人体の最先端の 3D 生成モデルと比較します。コードベースは研究目的で公開します。

We present SCULPT, a novel 3D generative model for clothed and textured 3D meshes of humans. Specifically, we devise a deep neural network that learns to represent the geometry and appearance distribution of clothed human bodies. Training such a model is challenging, as datasets of textured 3D meshes for humans are limited in size and accessibility. Our key observation is that there exist medium-sized 3D scan datasets like CAPE, as well as large-scale 2D image datasets of clothed humans and multiple appearances can be mapped to a single geometry. To effectively learn from the two data modalities, we propose an unpaired learning procedure for pose-dependent clothed and textured human meshes. Specifically, we learn a pose-dependent geometry space from 3D scan data. We represent this as per vertex displacements w.r.t. the SMPL model. Next, we train a geometry conditioned texture generator in an unsupervised way using the 2D image data. We use intermediate activations of the learned geometry model to condition our texture generator. To alleviate entanglement between pose and clothing type, and pose and clothing appearance, we condition both the texture and geometry generators with attribute labels such as clothing types for the geometry, and clothing colors for the texture generator. We automatically generated these conditioning labels for the 2D images based on the visual question answering model BLIP and CLIP. We validate our method on the SCULPT dataset, and compare to state-of-the-art 3D generative models for clothed human bodies. We will release the codebase for research purposes.

updated: Mon Aug 21 2023 11:23:25 GMT+0000 (UTC)

published: Mon Aug 21 2023 11:23:25 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト