One-shot Implicit Animatable Avatars with Model-based Priors

Yangyi Huang; Hongwei Yi; Weiyang Liu; Haofan Wang; Boxi Wu; Wenxiao Wang; Binbin Lin; Debing Zhang; Deng Cai

モデルベースのプライアを使用したワンショットの暗黙的でアニメーション化可能なアバター

人間のアバターを作成するための既存のニューラルレンダリング方法は、通常、ビデオやマルチビュー画像などの高密度の入力信号を必要とするか、大規模な特定の 3D 人間データセットから学習した事前情報を活用して、疎ビュー入力で再構成を実行できるようにします。単一の画像しか利用できない場合、これらの方法のほとんどは現実的な再構成を達成できません。現実的なアニメーション化可能な 3D 人間のデータ効率的な作成を可能にするために、単一の画像から人間固有の神経放射場を学習するための新しい方法である ELICIT を提案します。人間は身体のジオメトリを簡単に再構築し、1 つの画像から全身の衣服を推測できるという事実に着想を得て、ELICIT では 2 つの事前情報 (3D ジオメトリの事前情報と視覚的セマンティックの事前情報) を活用しています。具体的には、ELICIT はスキン頂点ベースのテンプレートモデル (つまり、SMPL) から事前に 3D ボディシェイプジオメトリを導入し、CLIP ベースの事前トレーニング済みモデルを使用して視覚的な衣服セマンティック事前を実装します。両方の事前確率を使用して、見えない領域にもっともらしいコンテンツを作成するための最適化を共同で導きます。視覚的な詳細をさらに改善するために、アバターのさまざまな部分を局所的に改良するセグメンテーションベースのサンプリング戦略を提案します。 ZJU-MoCAP、Human3.6M、DeepFashion などの複数の一般的なベンチマークでの包括的な評価は、1 つの画像しか利用できない場合、ELICIT が現在の最先端のアバター作成方法よりも優れていることを示しています。コードは研究目的で https://elicit3d.github.io で公開されます。

Existing neural rendering methods for creating human avatars typically either require dense input signals such as video or multi-view images, or leverage a learned prior from large-scale specific 3D human datasets such that reconstruction can be performed with sparse-view inputs. Most of these methods fail to achieve realistic reconstruction when only a single image is available. To enable the data-efficient creation of realistic animatable 3D humans, we propose ELICIT, a novel method for learning human-specific neural radiance fields from a single image. Inspired by the fact that humans can easily reconstruct the body geometry and infer the full-body clothing from a single image, we leverage two priors in ELICIT: 3D geometry prior and visual semantic prior. Specifically, ELICIT introduces the 3D body shape geometry prior from a skinned vertex-based template model (i.e., SMPL) and implements the visual clothing semantic prior with the CLIP-based pre-trained models. Both priors are used to jointly guide the optimization for creating plausible content in the invisible areas. In order to further improve visual details, we propose a segmentation-based sampling strategy that locally refines different parts of the avatar. Comprehensive evaluations on multiple popular benchmarks, including ZJU-MoCAP, Human3.6M, and DeepFashion, show that ELICIT has outperformed current state-of-the-art avatar creation methods when only a single image is available. Code will be public for reseach purpose at https://elicit3d.github.io .

updated: Mon Dec 05 2022 18:24:06 GMT+0000 (UTC)

published: Mon Dec 05 2022 18:24:06 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト