An Attractor-Guided Neural Networks for Skeleton-Based Human Motion Prediction

Pengxiang Ding; Jianqin Yin

スケルトンベースの人間の動きを予測するためのアトラクタ誘導ニューラルネットワーク

関節関係モデリングは、人間の動きの予測における重要な要素です。ほとんどの既存の方法は、関節間の関係を構築するために骨格ベースのグラフを設計する傾向があり、関節ペア間の局所的な相互作用が十分に学習されています。ただし、人間の動きのバランス特性を反映するすべての関節のグローバルな調整は、部分から全体へと漸進的かつ非同期的に学習されるため、通常は弱くなります。したがって、最終的に予測される動きは不自然な場合があります。この問題に取り組むために、我々はバランスアトラクタ（BA）と呼ばれる媒体を運動の時空間的特徴から学び、グローバルな運動特徴を特徴づけます。これはその後、新しい関節関係を構築するために使用されます。 BAを通じて、すべての関節が同期して関連付けられるため、すべての関節のグローバルな調整をより適切に学習できます。 BAに基づいて、アトラクタベースのジョイントリレーションエクストラクタ（AJRE）とマルチタイムスケールダイナミクスエクストラクタ（MTDE）を主に含む、アトラクタガイドニューラルネットワークと呼ばれるフレームワークを提案します。 AJREには、主にGlobal Coordination Extractor（GCE）とLocal Interaction Extractor（LIE）が含まれています。前者はすべての関節のグローバルな調整を示し、後者は関節ペア間のローカルな相互作用をエンコードします。 MTDEは、効果的な予測のために生の位置情報から動的情報を抽出するように設計されています。広範な実験により、提案されたフレームワークは、H3.6M、CMU-Mocap、および3DPWの短期および長期予測の両方で、最先端の方法よりも優れていることが示されています。

Joint relation modeling is a curial component in human motion prediction. Most existing methods tend to design skeletal-based graphs to build the relations among joints, where local interactions between joint pairs are well learned. However, the global coordination of all joints, which reflects human motion's balance property, is usually weakened because it is learned from part to whole progressively and asynchronously. Thus, the final predicted motions are sometimes unnatural. To tackle this issue, we learn a medium, called balance attractor (BA), from the spatiotemporal features of motion to characterize the global motion features, which is subsequently used to build new joint relations. Through the BA, all joints are related synchronously, and thus the global coordination of all joints can be better learned. Based on the BA, we propose our framework, referred to Attractor-Guided Neural Network, mainly including Attractor-Based Joint Relation Extractor (AJRE) and Multi-timescale Dynamics Extractor (MTDE). The AJRE mainly includes Global Coordination Extractor (GCE) and Local Interaction Extractor (LIE). The former presents the global coordination of all joints, and the latter encodes local interactions between joint pairs. The MTDE is designed to extract dynamic information from raw position information for effective prediction. Extensive experiments show that the proposed framework outperforms state-of-the-art methods in both short and long-term predictions in H3.6M, CMU-Mocap, and 3DPW.

updated: Thu May 20 2021 12:51:39 GMT+0000 (UTC)

published: Thu May 20 2021 12:51:39 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト