We are More than Our Joints: Predicting how 3D Bodies Move

Yan Zhang; Michael J. Black; Siyu Tang

私たちは関節以上のものです：3Dボディがどのように動くかを予測する

人間の行動を理解するための重要なステップは、3D人間の動きの予測です。成功するソリューションには、人身売買、HCI、およびグラフィックスで多くのアプリケーションがあります。これまでのほとんどの作業は、過去の3D関節のシーケンスが与えられた場合に、将来の3D関節の位置の時系列を予測することに焦点を当てています。このユークリッド定式化は、一般に、関節の回転に関してポーズを予測するよりもうまく機能します。ただし、体の関節の位置は3D人間のポーズを完全に制約せず、自由度が定義されていないため、関節だけからリアルな人間をアニメートすることは困難です。 3Dジョイントは、スパースポイントクラウドと見なすことができることに注意してください。したがって、人間の動きの予測の問題は、点群の予測と見なすことができます。この観察により、代わりに、モーションキャプチャマーカーに対応する体表面上のまばらな位置のセットを予測します。このようなマーカーを前提として、パラメトリックボディモデルを適合させて、人物の3D形状とポーズを復元します。これらのまばらな表面マーカーは、関節には存在しない人間の動きに関する詳細な情報も伝達し、予測される動きの自然さを高めます。 AMASSデータセットを使用して、潜在周波数からモーションを生成する新しい変分オートエンコーダであるMOJOをトレーニングします。 MOJOは、入力モーションの完全な時間分解能を維持し、潜在周波数からのサンプリングにより、生成されたモーションに高周波成分が明示的に導入されます。動き予測方法は時間の経過とともにエラーを蓄積し、その結果、関節やマーカーが実際の人体から分岐することに注意してください。これに対処するために、SMPL-Xを各タイムステップの予測に適合させ、ソリューションを有効なボディの空間に投影します。これらの有効なマーカーは、時間内に伝播されます。実験は、私たちの方法が最先端の結果とリアルな3Dボディアニメーションを生成することを示しています。調査用のコードはhttps://yz-cnsdqz.github.io/MOJO/MOJO.htmlにあります

A key step towards understanding human behavior is the prediction of 3D human motion. Successful solutions have many applications in human tracking, HCI, and graphics. Most previous work focuses on predicting a time series of future 3D joint locations given a sequence 3D joints from the past. This Euclidean formulation generally works better than predicting pose in terms of joint rotations. Body joint locations, however, do not fully constrain 3D human pose, leaving degrees of freedom undefined, making it hard to animate a realistic human from only the joints. Note that the 3D joints can be viewed as a sparse point cloud. Thus the problem of human motion prediction can be seen as point cloud prediction. With this observation, we instead predict a sparse set of locations on the body surface that correspond to motion capture markers. Given such markers, we fit a parametric body model to recover the 3D shape and pose of the person. These sparse surface markers also carry detailed information about human movement that is not present in the joints, increasing the naturalness of the predicted motions. Using the AMASS dataset, we train MOJO, which is a novel variational autoencoder that generates motions from latent frequencies. MOJO preserves the full temporal resolution of the input motion, and sampling from the latent frequencies explicitly introduces high-frequency components into the generated motion. We note that motion prediction methods accumulate errors over time, resulting in joints or markers that diverge from true human bodies. To address this, we fit SMPL-X to the predictions at each time step, projecting the solution back onto the space of valid bodies. These valid markers are then propagated in time. Experiments show that our method produces state-of-the-art results and realistic 3D body animations. The code for research purposes is at https://yz-cnsdqz.github.io/MOJO/MOJO.html

updated: Tue Dec 01 2020 16:41:04 GMT+0000 (UTC)

published: Tue Dec 01 2020 16:41:04 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト