Get3DHuman: Lifting StyleGAN-Human into a 3D Generative Model using Pixel-aligned Reconstruction Priors

Zhangyang Xiong; Di Kang; Derong Jin; Weikai Chen; Linchao Bao; Shuguang Cui; Xiaoguang Han

Get3DHuman: ピクセル整列再構成プライアを使用して、StyleGAN-Human を 3D 生成モデルに持ち上げる

高品質の 3D デジタルヒューマンを高速に生成することは、エンターテイメントから専門的な問題に至るまで、膨大な数のアプリケーションにとって重要です。微分可能レンダリングの最近の進歩により、3D グラウンドトゥルースを必要とせずに 3D 生成モデルのトレーニングが可能になりました。ただし、生成された 3D 人間の品質は、忠実度と多様性の両方の点でまだ改善の余地があります。このホワイトペーパーでは、Get3DHuman を紹介します。これは、限られた予算の 3D グラウンドトゥルースデータのみを使用して、生成された結果のリアリズムと多様性を大幅に向上させることができる新しい 3D ヒューマンフレームワークです。私たちの重要な観察結果は、3D ジェネレーターは、2D ヒューマンジェネレーターと 3D リコンストラクターによって学習された人間関連の事前確率から利益を得ることができるということです。具体的には、特別に設計された以前のネットワークを介して、Get3DHuman の潜在空間を StyleGAN-Human の潜在空間と橋渡しします。このネットワークでは、入力潜在コードが、ピクセル整列 3D リコンストラクターによってスパンされる形状とテクスチャの特徴ボリュームにマッピングされます。前のネットワークの結果は、メインの発電機ネットワークの監視信号として活用されます。効果的なトレーニングを確実にするために、生成された特徴ボリュームと中間特徴マップに適用される 3 つの調整された損失をさらに提案します。広範な実験により、Get3DHuman が他の最先端のアプローチよりもはるかに優れており、形状補間、形状の再テクスチャリング、潜在反転による単一ビューの再構成など、幅広いアプリケーションをサポートできることが実証されています。

Fast generation of high-quality 3D digital humans is important to a vast number of applications ranging from entertainment to professional concerns. Recent advances in differentiable rendering have enabled the training of 3D generative models without requiring 3D ground truths. However, the quality of the generated 3D humans still has much room to improve in terms of both fidelity and diversity. In this paper, we present Get3DHuman, a novel 3D human framework that can significantly boost the realism and diversity of the generated outcomes by only using a limited budget of 3D ground-truth data. Our key observation is that the 3D generator can profit from human-related priors learned through 2D human generators and 3D reconstructors. Specifically, we bridge the latent space of Get3DHuman with that of StyleGAN-Human via a specially-designed prior network, where the input latent code is mapped to the shape and texture feature volumes spanned by the pixel-aligned 3D reconstructor. The outcomes of the prior network are then leveraged as the supervisory signals for the main generator network. To ensure effective training, we further propose three tailored losses applied to the generated feature volumes and the intermediate feature maps. Extensive experiments demonstrate that Get3DHuman greatly outperforms the other state-of-the-art approaches and can support a wide range of applications including shape interpolation, shape re-texturing, and single-view reconstruction through latent inversion.

updated: Mon Mar 13 2023 12:37:15 GMT+0000 (UTC)

published: Thu Feb 02 2023 15:37:46 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト