PhysCap: Physically Plausible Monocular 3D Motion Capture in Real Time

Soshi Shimada; Vladislav Golyanik; Weipeng Xu; Christian Theobalt

PhysCap：物理的にもっともらしい単眼3Dモーションキャプチャをリアルタイムで

単色のカメラからのマーカーレス3D人間モーションキャプチャは、大きな進歩を遂げています。ただし、これは非常に困難で深刻な問題です。その結果、最も正確な最先端のアプローチでさえ、大きな制限があります。個々の関節またはスケルトンに基づく純粋な運動学的定式化、および最先端の方法での頻繁なフレームごとの再構築は、マルチビューまたはマーカーベースのモーションキャプチャと比較して、3Dの精度と時間的安定性を大幅に制限します。さらに、キャプチャされた3Dポーズは、多くの場合、物理的に不正確で生体力学的に信じられない、または信じられないほどの環境の相互作用（床の貫通、足のスケート、不自然な体の傾き、奥行きの強いシフト）を示します。これは、コンピュータグラフィックスのユースケースでは問題になります。したがって、物理的なもっともらしく、リアルタイムでマーカーのない人間の3Dモーションキャプチャを25 fpsで単一のカラーカメラで行う最初のアルゴリズムであるPhysCapを紹介します。私たちのアルゴリズムは、最初に純粋に運動学的に3D人間のポーズをキャプチャします。このために、CNNは2Dと3Dの関節位置を推測し、その後、インバースキネマティクスステップが時空間コヒーレント関節角度とグローバル3Dポーズを見つけます。次に、これらの運動学的再構成は、環境の制約（衝突処理や床の配置など）、重力、および人間の姿勢の生物物理的な妥当性を考慮した、リアルタイムの物理ベースのポーズオプティマイザーの制約として使用されます。私たちのアプローチは、地面反力と残余力の組み合わせをもっともらしい根の制御に使用し、訓練されたニューラルネットワークを使用して画像内の足の接触イベントを検出します。私たちの方法は、物理的に妥当ではない、時間的に安定したグローバルな3D人間の動きを、物理的に妥当でない姿勢、床の貫通、足のスケートなしで、リアルタイムおよび一般的なシーンのビデオからキャプチャします。ビデオはhttp://gvv.mpi-inf.mpg.de/projects/PhysCapから入手できます。

Marker-less 3D human motion capture from a single colour camera has seen significant progress. However, it is a very challenging and severely ill-posed problem. In consequence, even the most accurate state-of-the-art approaches have significant limitations. Purely kinematic formulations on the basis of individual joints or skeletons, and the frequent frame-wise reconstruction in state-of-the-art methods greatly limit 3D accuracy and temporal stability compared to multi-view or marker-based motion capture. Further, captured 3D poses are often physically incorrect and biomechanically implausible, or exhibit implausible environment interactions (floor penetration, foot skating, unnatural body leaning and strong shifting in depth), which is problematic for any use case in computer graphics. We, therefore, present PhysCap, the first algorithm for physically plausible, real-time and marker-less human 3D motion capture with a single colour camera at 25 fps. Our algorithm first captures 3D human poses purely kinematically. To this end, a CNN infers 2D and 3D joint positions, and subsequently, an inverse kinematics step finds space-time coherent joint angles and global 3D pose. Next, these kinematic reconstructions are used as constraints in a real-time physics-based pose optimiser that accounts for environment constraints (e.g., collision handling and floor placement), gravity, and biophysical plausibility of human postures. Our approach employs a combination of ground reaction force and residual force for plausible root control, and uses a trained neural network to detect foot contact events in images. Our method captures physically plausible and temporally stable global 3D human motion, without physically implausible postures, floor penetrations or foot skating, from video in real time and in general scenes. The video is available at http://gvv.mpi-inf.mpg.de/projects/PhysCap

updated: Wed Dec 09 2020 14:18:55 GMT+0000 (UTC)

published: Thu Aug 20 2020 10:46:32 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト