SOMA: Solving Optical Marker-Based MoCap Automatically

Nima Ghorbani; Michael J. Black

SOMA：光学マーカーベースのモーションキャプチャを自動的に解決する

マーカーベースの光学式モーションキャプチャ（mocap）は、コンピュータビジョン、医療、およびグラフィックスで正確な3D人間の動きを取得するための「ゴールドスタンダード」の方法です。これらのシステムの生の出力は、ノイズが多く不完全な3Dポイントまたはポイントの短いトラックレットです。有用であるためには、これらのポイントをキャプチャされた被写体の対応するマーカーに関連付ける必要があります。すなわち「ラベリング」。これらのラベルが与えられると、3Dスケルトンまたは体表面メッシュを「解決」できます。市販の自動ラベル付けツールは、キャプチャ時に特定のキャリブレーション手順を必要としますが、これはアーカイブデータでは不可能です。ここでは、SOMAと呼ばれる新しいニューラルネットワークをトレーニングします。これは、さまざまな数のポイントを持つ生のmocapポイントクラウドを取得し、キャプチャテクノロジーに依存せず、最小限の人間の介入のみで、キャリブレーションデータなしで大規模にラベル付けします。私たちの重要な洞察は、点群のラベル付けは非常にあいまいですが、3Dボディは、学習ベースの方法で利用できるソリューションに強い制約を与えるということです。学習を可能にするために、AMASSの3Dボディによってアニメーション化されたノイズの多いグラウンドトゥルースモーションキャプチャマーカーの大規模なトレーニングセットを生成します。 SOMAは、積み重ねられた自己注意要素を備えたアーキテクチャを活用して、3Dボディの空間構造と最適なトランスポート層を学習し、外れ値を拒否しながら割り当て（ラベル付け）問題を制約します。 SOMAを定量的および定性的に広範囲に評価します。 SOMAは、既存の最先端の研究方法よりも正確で堅牢であり、商用システムでは適用できない場合に適用できます。さまざまなテクノロジーを使用してキャプチャされた4つの異なるデータセットにわたる8時間以上のアーカイブモーションキャプチャデータに自動的にラベルを付け、SMPL-Xボディモデルを出力します。モデルとデータは、https：//soma.is.tue.mpg.de/で調査目的でリリースされています。

Marker-based optical motion capture (mocap) is the "gold standard" method for acquiring accurate 3D human motion in computer vision, medicine, and graphics. The raw output of these systems are noisy and incomplete 3D points or short tracklets of points. To be useful, one must associate these points with corresponding markers on the captured subject; i.e. "labelling". Given these labels, one can then "solve" for the 3D skeleton or body surface mesh. Commercial auto-labeling tools require a specific calibration procedure at capture time, which is not possible for archival data. Here we train a novel neural network called SOMA, which takes raw mocap point clouds with varying numbers of points, labels them at scale without any calibration data, independent of the capture technology, and requiring only minimal human intervention. Our key insight is that, while labeling point clouds is highly ambiguous, the 3D body provides strong constraints on the solution that can be exploited by a learning-based method. To enable learning, we generate massive training sets of simulated noisy and ground truth mocap markers animated by 3D bodies from AMASS. SOMA exploits an architecture with stacked self-attention elements to learn the spatial structure of the 3D body and an optimal transport layer to constrain the assignment (labeling) problem while rejecting outliers. We extensively evaluate SOMA both quantitatively and qualitatively. SOMA is more accurate and robust than existing state of the art research methods and can be applied where commercial systems cannot. We automatically label over 8 hours of archival mocap data across 4 different datasets captured using various technologies and output SMPL-X body models. The model and data is released for research purposes at https://soma.is.tue.mpg.de/.

updated: Sat Oct 09 2021 02:27:27 GMT+0000 (UTC)

published: Sat Oct 09 2021 02:27:27 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト