Multi-frame sequence generator of 4D human body motion

Marsot Mathieu; Wuhrer Stefanie; Franco Jean-Sebastien; Durocher Stephane

4D人体運動のマルチフレームシーケンスジェネレーター

時間的および空間的に密な 4D 人体の動きを生成する問題を調べます。一方で、生成モデリングは、時間的側面が生成モデルから除外されているメッシュ表現などの高密度 3D モデルの時間フレームごとの静的フィッティング問題として広く研究されてきました。一方、マーカーベースのキャプチャ表現などのまばらな人間モデルには時間的生成モデルが存在しますが、私たちの知る限り、高密度の 3D 形状に拡張されていません。このギャップを、形態、翻訳と回転を含むグローバルな移動、および単一の潜在的な空間ベクトルとしてのマルチフレームの時間的動きをエンコードする生成オートエンコーダーベースのフレームワークで埋めることを提案します。その一般化と因数分解の能力を評価するために、AMASS の周期的な運動サブセットでモデルをトレーニングし、広範なモーションキャプチャのセットに提供する高密度表面モデルを活用します。私たちの結果は、低誤差範囲内で人間の運動の 4D シーケンスを再構築するモデルの能力と、異なるマルチフレームシーケンスと運動タイプを表す潜在ベクトル間の潜在空間補間の意味を検証します。また、初期の人間の運動フレームから将来のフレームの 4D の人間の運動予測のアプローチの利点を示し、人間の運動の現実的な時空間的特徴を学習するモデルの有望な能力を示しています。私たちのモデルでは、空間的にも時間的にもまばらなデータのデータ補完が可能であることを示しています。

We examine the problem of generating temporally and spatially dense 4D human body motion. On the one hand generative modeling has been extensively studied as a per time-frame static fitting problem for dense 3D models such as mesh representations, where the temporal aspect is left out of the generative model. On the other hand, temporal generative models exist for sparse human models such as marker-based capture representations, but have not to our knowledge been extended to dense 3D shapes. We propose to bridge this gap with a generative auto-encoder-based framework, which encodes morphology, global locomotion including translation and rotation, and multi-frame temporal motion as a single latent space vector. To assess its generalization and factorization abilities, we train our model on a cyclic locomotion subset of AMASS, leveraging the dense surface models it provides for an extensive set of motion captures. Our results validate the ability of the model to reconstruct 4D sequences of human locomotions within a low error bound, and the meaningfulness of latent space interpolation between latent vectors representing different multi-frame sequences and locomotion types. We also illustrate the benefits of the approach for 4D human motion prediction of future frames from initial human locomotion frames, showing promising abilities of our model to learn realistic spatio-temporal features of human motion. We show that our model allows for data completion of both spatially and temporally sparse data.

updated: Mon Jun 07 2021 13:56:46 GMT+0000 (UTC)

published: Mon Jun 07 2021 13:56:46 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト