A structured latent space for human body motion generation

Mathieu Marsot; Stefanie Wuhrer; Jean-Sebastien Franco; Stephane Durocher

人体運動生成のための構造化された潜在空間

この作品は、時間的および空間的に密な4D人体の動きを表現および生成するための構造化された潜在空間の学習を調査します。トレーニングが完了すると、提案されたモデルは、低次元の潜在空間内の単一の点に基づいて、密な3Dメッシュのマルチフレームシーケンスを生成します。基礎となる構造化された潜在空間を備えた人間の動きの生成モデルを学習することは、仮想現実と拡張現実、3Dテレプレゼンス、エンターテインメントアプリケーションのコンテンツ生成など、コンピュータービジョンとグラフィックスの幅広いアプリケーションにとって重要です。この潜在的なモーション表現を、2つの既存の作業ラインに基づいて構築されたデータ駆動型フレームワークで学習します。 1つ目は、時間的に密な骨格データを分析して、モーションのグローバルな変位、ポーズ、時間的進化をキャプチャします。2つ目は、静的に密にキャプチャされた人間のスキャンを3Dで分析して、低次元空間で現実的な3D人体表面を表現します。これらの2つの概念のそれぞれの利点に基づいて構築することで、モデルは、モーションのすべての時点で、さまざまな持続時間と詳細な3Dジオメトリのシーケンスの時間モーション情報を同時に表すことができます。結果として生じる潜在空間が、この空間で同様の動きがクラスターを形成するという意味で構造化されていることを実験的に示し、潜在空間を使用して、異なるアクション間のもっともらしい補間を生成します。また、人間の動きの時空間的特徴を学習するためのモデルの有望な能力を示して、4D人間の動きの完了のためのアプローチの利点を示します

This work investigates learning a structured latent space to represent and generate temporally and spatially dense 4D human body motion. Once trained, the proposed model generates a multi-frame sequence of dense 3D meshes based on a single point in a low-dimensional latent space. Learning a generative model of human motion with an underlying structured latent space is important for a wide set of applications in computer vision and graphics, including virtual and augmented reality, 3D telepresence, and content generation for entertainment applications. We learn this latent motion representation in a data-driven framework that builds upon two existing lines of works. The first analyzes temporally dense skeletal data to capture the global displacement, poses and temporal evolution of the motion, while the second analyzes static densely captured human scans in 3D to represent realistic 3D human body surfaces in a lowdimensional space. Building upon the respective advantages of these two concepts allows our model to simultaneously represent temporal motion information for sequences of varying duration and detailed 3D geometry at every time instant of the motion. We experimentally demonstrate that the resulting latent space is structured in the sense that similar motions form clusters in this space, and use our latent space to generate plausible interpolations between different actions. We also illustrate the benefits of the approach for 4D human motion completion, showing promising abilities of our model to learn spatio-temporal features of human motion

updated: Mon Mar 07 2022 15:03:51 GMT+0000 (UTC)

published: Mon Jun 07 2021 13:56:46 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト