SoMoFormer: Multi-Person Pose Forecasting with Transformers

Edward Vendrow; Satyajit Kumar; Ehsan Adeli; Hamid Rezatofighi

SoMoFormer: トランスフォーマーによる複数人の姿勢予測

人間の姿勢の予測は、複雑な人体の動きと姿勢のダイナミクスを含む困難な問題です。環境内に複数の人がいる場合、自分の動きも他の人の動きや動的な動きの影響を受ける可能性があります。複数人の動的ポーズ予測の問題を対象とした以前の研究がいくつかありますが、ポーズシーケンス全体を時系列としてモデル化する (関節間の基本的な関係を無視する) か、一度に 1 人の人物の将来のポーズシーケンスのみを出力することがよくあります。この論文では、ソーシャルモーショントランスフォーマー (SoMoFormer) と呼ばれる、複数人の 3D 姿勢予測のための新しい方法を紹介します。当社のトランスフォーマーアーキテクチャは、人間のモーション入力を時間シーケンスではなくジョイントシーケンスとして独自にモデル化するため、各ジョイントの将来のモーションシーケンス全体を並行して予測しながら、ジョイントに注意を向けることができます。この問題の再定式化により、シーン内のすべての人の関節を入力クエリとして使用することにより、SoMoFormer が自然に複数人のシーンに拡張されることを示します。学習済みの埋め込みを使用して、関節のタイプ、個人のアイデンティティ、およびグローバルな位置を示すことで、モデルは関節間および人々の間の関係を学習し、同じ人または近くの人々からの関節により強く注意を払います。 SoMoFormer は、CMU-Mocap および MuPoTS-3D データセットだけでなく、SoMoF ベンチマークでの長期モーション予測の最先端の方法よりも優れています。コードは公開後に公開されます。

Human pose forecasting is a challenging problem involving complex human body motion and posture dynamics. In cases that there are multiple people in the environment, one's motion may also be influenced by the motion and dynamic movements of others. Although there are several previous works targeting the problem of multi-person dynamic pose forecasting, they often model the entire pose sequence as time series (ignoring the underlying relationship between joints) or only output the future pose sequence of one person at a time. In this paper, we present a new method, called Social Motion Transformer (SoMoFormer), for multi-person 3D pose forecasting. Our transformer architecture uniquely models human motion input as a joint sequence rather than a time sequence, allowing us to perform attention over joints while predicting an entire future motion sequence for each joint in parallel. We show that with this problem reformulation, SoMoFormer naturally extends to multi-person scenes by using the joints of all people in a scene as input queries. Using learned embeddings to denote the type of joint, person identity, and global position, our model learns the relationships between joints and between people, attending more strongly to joints from the same or nearby people. SoMoFormer outperforms state-of-the-art methods for long-term motion prediction on the SoMoF benchmark as well as the CMU-Mocap and MuPoTS-3D datasets. Code will be made available after publication.

updated: Tue Aug 30 2022 06:59:28 GMT+0000 (UTC)

published: Tue Aug 30 2022 06:59:28 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト