AvatarPoser: Articulated Full-Body Pose Tracking from Sparse Motion Sensing

Jiaxi Jiang; Paul Streli; Huajian Qiu; Andreas Fender; Larissa Laich; Patrick Snape; Christian Holz

AvatarPoser：スパースモーションセンシングからの関節式全身ポーズ追跡

今日のMixedRealityヘッドマウントディスプレイは、世界空間でのユーザーの頭のポーズと、拡張現実と仮想現実の両方のシナリオでの対話のためのユーザーの手を追跡します。これはユーザー入力をサポートするのに十分ですが、残念ながら、ユーザーの仮想表現は上半身だけに制限されます。したがって、現在のシステムはフローティングアバターに頼っていますが、その制限はコラボレーション環境で特に顕著です。疎な入力ソースから全身のポーズを推定するために、以前の作業では、骨盤または下半身に追加のトラッカーとセンサーが組み込まれているため、セットアップが複雑になり、モバイル設定での実際のアプリケーションが制限されます。この論文では、ユーザーの頭と手からのモーション入力のみを使用して、世界座標で全身のポーズを予測する最初の学習ベースの方法であるAvatarPoserを紹介します。私たちの方法は、Transformerエンコーダーに基づいて、入力信号から深い特徴を抽出し、学習したローカルジョイントの向きからグローバルモーションを切り離して、ポーズ推定をガイドします。モーションキャプチャアニメーションに似た正確な全身モーションを取得するために、逆運動学を使用した最適化ルーチンを使用して腕の関節の位置を調整し、元のトラッキング入力に一致させます。私たちの評価では、AvatarPoserは、大規模なモーションキャプチャデータセット（AMASS）の評価で新しい最先端の結果を達成しました。同時に、私たちのメソッドの推論速度はリアルタイム操作をサポートし、メタバースアプリケーションの全体的なアバター制御と表現をサポートするための実用的なインターフェイスを提供します。

Today's Mixed Reality head-mounted displays track the user's head pose in world space as well as the user's hands for interaction in both Augmented Reality and Virtual Reality scenarios. While this is adequate to support user input, it unfortunately limits users' virtual representations to just their upper bodies. Current systems thus resort to floating avatars, whose limitation is particularly evident in collaborative settings. To estimate full-body poses from the sparse input sources, prior work has incorporated additional trackers and sensors at the pelvis or lower body, which increases setup complexity and limits practical application in mobile settings. In this paper, we present AvatarPoser, the first learning-based method that predicts full-body poses in world coordinates using only motion input from the user's head and hands. Our method builds on a Transformer encoder to extract deep features from the input signals and decouples global motion from the learned local joint orientations to guide pose estimation. To obtain accurate full-body motions that resemble motion capture animations, we refine the arm joints' positions using an optimization routine with inverse kinematics to match the original tracking input. In our evaluation, AvatarPoser achieved new state-of-the-art results in evaluations on large motion capture datasets (AMASS). At the same time, our method's inference speed supports real-time operation, providing a practical interface to support holistic avatar control and representation for Metaverse applications.

updated: Wed Jul 27 2022 20:52:39 GMT+0000 (UTC)

published: Wed Jul 27 2022 20:52:39 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト