Dual-Space NeRF: Learning Animatable Avatars and Scene Lighting in Separate Spaces

Yihao Zhi; Shenhan Qian; Xinhao Yan; Shenghua Gao

Dual-Space NeRF: アニメート可能なアバターとシーンの照明を別々のスペースで学習する

キャノニカルスペースで人体をモデル化することは、キャプチャとアニメーションの一般的な方法です。しかし、ニューラルラディアンスフィールド (NeRF) が関係する場合、正準空間で静的な NeRF を学習するだけでは十分ではありません。これは、シーンの照明が一定であっても、人が動くと体の照明が変化するためです。以前の方法は、フレームごとの埋め込みを学習することによって照明の不一致を軽減しますが、この操作は目に見えないポーズには一般化されません。照明条件がワールド空間で静的であるのに対し、人体は正準空間で一貫していることを考えると、2 つの別々の空間で 2 つの MLP を使用してシーンの照明と人体をモデル化するデュアルスペース NeRF を提案します。これら 2 つのスペースを橋渡しするために、以前の方法は主にリニアブレンドスキニング (LBS) アルゴリズムに依存していました。ただし、動的ニューラルフィールドの LBS のブレンディングウェイトは扱いにくいため、通常は別の MLP で記憶されるため、新しいポーズには一般化されません。 SMPL などのパラメトリックメッシュのブレンディングウェイトを借用することは可能ですが、補間操作により、より多くのアーティファクトが導入されます。この論文では、重心マッピングを使用することを提案します。重心マッピングは、目に見えないポーズに直接一般化でき、驚くべきことにニューラルブレンディングウェイトを使用した LBS よりも優れた結果を達成します。 Human3.6M および ZJU-MoCap データセットの定量的および定性的な結果は、この方法の有効性を示しています。

Modeling the human body in a canonical space is a common practice for capturing and animation. But when involving the neural radiance field (NeRF), learning a static NeRF in the canonical space is not enough because the lighting of the body changes when the person moves even though the scene lighting is constant. Previous methods alleviate the inconsistency of lighting by learning a per-frame embedding, but this operation does not generalize to unseen poses. Given that the lighting condition is static in the world space while the human body is consistent in the canonical space, we propose a dual-space NeRF that models the scene lighting and the human body with two MLPs in two separate spaces. To bridge these two spaces, previous methods mostly rely on the linear blend skinning (LBS) algorithm. However, the blending weights for LBS of a dynamic neural field are intractable and thus are usually memorized with another MLP, which does not generalize to novel poses. Although it is possible to borrow the blending weights of a parametric mesh such as SMPL, the interpolation operation introduces more artifacts. In this paper, we propose to use the barycentric mapping, which can directly generalize to unseen poses and surprisingly achieves superior results than LBS with neural blending weights. Quantitative and qualitative results on the Human3.6M and the ZJU-MoCap datasets show the effectiveness of our method.

updated: Wed Aug 31 2022 13:35:04 GMT+0000 (UTC)

published: Wed Aug 31 2022 13:35:04 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト