Normalized Avatar Synthesis Using StyleGAN and Perceptual Refinement

Huiwen Luo; Koki Nagano; Han-Wei Kung; Mclean Goldwhite; Qingguo Xu; Zejian Wang; Lingyu Wei; Liwen Hu; Hao Li

StyleGANと知覚的洗練を使用した正規化されたアバター合成

制約のない単一の写真から人物の正規化された3Dアバターをデジタル化するための非常に堅牢なGANベースのフレームワークを紹介します。入力画像は笑顔の人でも極端な照明条件でも撮影できますが、私たちの方法では、拡散照明条件でニュートラルな表情の人の顔と肌の質感の高品質なテクスチャモデルを確実に生成できます。最先端の3D顔再構成方法では、GANベースのデコーダーと組み合わせた非線形の変形可能な顔モデルを使用して人物の肖像と詳細をキャプチャしますが、再照明可能でアニメーションに適した作成に不可欠な、陰影のないアルベドテクスチャを備えたニュートラルヘッドモデルを作成できません仮想環境に統合するためのアバター。既存の方法が機能するための重要な課題は、正規化された3D面を含むトレーニングとグラウンドトゥルースデータの欠如です。この問題に対処するために、2段階のアプローチを提案します。まず、非線形の変形可能な顔モデルをStyleGAN2ネットワークに埋め込むことにより、非常に堅牢な正規化された3D顔ジェネレーターを採用します。これにより、詳細であるが正規化された顔のアセットを生成できます。次に、この推論の後に、生成されたアセットを正則化として使用して、正規化された顔の限られた利用可能なトレーニングサンプルに対処する知覚的改良ステップが続きます。さらに、写真測量スキャン、厳選された写真、拡散照明条件でニュートラルな表情を生成した偽の人物の組み合わせで構成される正規化された顔のデータセットを紹介します。準備したデータセットには、最先端のGANベースの3D顔再構成法よりも2桁少ない被験者が含まれていますが、非常に困難な制約のない入力画像に対して高品質の正規化された顔モデルを生成でき、現在よりも優れたパフォーマンスを示すことができます。最先端。

We introduce a highly robust GAN-based framework for digitizing a normalized 3D avatar of a person from a single unconstrained photo. While the input image can be of a smiling person or taken in extreme lighting conditions, our method can reliably produce a high-quality textured model of a person's face in neutral expression and skin textures under diffuse lighting condition. Cutting-edge 3D face reconstruction methods use non-linear morphable face models combined with GAN-based decoders to capture the likeness and details of a person but fail to produce neutral head models with unshaded albedo textures which is critical for creating relightable and animation-friendly avatars for integration in virtual environments. The key challenges for existing methods to work is the lack of training and ground truth data containing normalized 3D faces. We propose a two-stage approach to address this problem. First, we adopt a highly robust normalized 3D face generator by embedding a non-linear morphable face model into a StyleGAN2 network. This allows us to generate detailed but normalized facial assets. This inference is then followed by a perceptual refinement step that uses the generated assets as regularization to cope with the limited available training samples of normalized faces. We further introduce a Normalized Face Dataset, which consists of a combination photogrammetry scans, carefully selected photographs, and generated fake people with neutral expressions in diffuse lighting conditions. While our prepared dataset contains two orders of magnitude less subjects than cutting edge GAN-based 3D facial reconstruction methods, we show that it is possible to produce high-quality normalized face models for very challenging unconstrained input images, and demonstrate superior performance to the current state-of-the-art.

updated: Mon Jun 21 2021 21:57:16 GMT+0000 (UTC)

published: Mon Jun 21 2021 21:57:16 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト