DeepMoCap: Deep Optical Motion Capture Using Multiple Depth Sensors and Retro-Reflectors

Anargyros Chatzitofis; Dimitrios Zarpalas; Stefanos Kollias; Petros Daras

DeepMoCap：複数の深度センサーとレトロリフレクターを使用したディープオプティカルモーションキャプチャ

この論文では、マーカーベースの一人の光学式モーションキャプチャ法（DeepMoCap）を、複数の時空間的に位置合わせされた赤外線深度センサーと再帰反射ストラップおよびパッチ（リフレクター）を使用して提案します。 DeepMoCapは、深度画像、続いて3D空間でリフレクターを自動的にローカライズしてラベル付けすることにより、モーションキャプチャを調査します。カラー化されたデプスマップと3Dオプティカルフローフレームのペア間の時間的相関をエンコードするための非パラメトリック表現を導入し、リフレクターの位置とシーケンシャルフレーム間の時間的依存性を共同で学習する多段完全畳み込みネットワーク（FCN）アーキテクチャを提案します。抽出されたリフレクターの2D位置は、3D空間に空間的にマッピングされ、堅牢な3D光学データ抽出を実現します。抽出された光学データにテンプレートベースのフィッティング技術を適用することにより、被験者の動きを効率的にキャプチャします。 2つのデータセットが作成され、評価目的で公開されています。 1つはマルチビュー深度と3Dオプティカルフロー注釈付き画像（DMC2.5D）で構成され、もう1つは時空間的に位置合わせされたマルチビュー深度画像とスケルトン、慣性、グラウンドトゥルースのモーションキャプチャデータ（DMC3D）で構成されます。 FCNモデルは、2Dの正しいキーポイントのパーセンテージ（PCK）メトリックを使用して、DMC2.5Dデータセットで競合他社を上回り、モーションキャプチャの結果はDMC3DでRGB-Dおよび慣性データ融合アプローチに対して評価され、次善の方法を4.5％上回ります。合計3DPCK精度。

In this paper, a marker-based, single-person optical motion capture method (DeepMoCap) is proposed using multiple spatio-temporally aligned infrared-depth sensors and retro-reflective straps and patches (reflectors). DeepMoCap explores motion capture by automatically localizing and labeling reflectors on depth images and, subsequently, on 3D space. Introducing a non-parametric representation to encode the temporal correlation among pairs of colorized depthmaps and 3D optical flow frames, a multi-stage Fully Convolutional Network (FCN) architecture is proposed to jointly learn reflector locations and their temporal dependency among sequential frames. The extracted reflector 2D locations are spatially mapped in 3D space, resulting in robust 3D optical data extraction. The subject's motion is efficiently captured by applying a template-based fitting technique on the extracted optical data. Two datasets have been created and made publicly available for evaluation purposes; one comprising multi-view depth and 3D optical flow annotated images (DMC2.5D), and a second, consisting of spatio-temporally aligned multi-view depth images along with skeleton, inertial and ground truth MoCap data (DMC3D). The FCN model outperforms its competitors on the DMC2.5D dataset using 2D Percentage of Correct Keypoints (PCK) metric, while the motion capture outcome is evaluated against RGB-D and inertial data fusion approaches on DMC3D, outperforming the next best method by 4.5% in total 3D PCK accuracy.

updated: Thu Oct 14 2021 11:40:26 GMT+0000 (UTC)

published: Thu Oct 14 2021 11:40:26 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト