Spatiotemporal Co-attention Recurrent Neural Networks for Human-Skeleton   Motion Prediction

Xiangbo Shu; Liyan Zhang; Guo-Jun Qi; Wei Liu; Jinhui Tang

人間の骨格運動予測のための時空間共注意リカレントニューラルネットワーク

Spatiotemporal Co-attention Recurrent Neural Networks for Human-Skeleton Motion Prediction

人間の動き予測は、観測された人間の動きに基づいて将来の動きを生成することを目的としています。シーケンシャルデータのモデリングにおけるリカレントニューラルネットワーク（RNN）の成功を目の当たりにして、最近の作品ではRNNを使用して、観察されたモーションシーケンスでの人間の骨格モーションをモデリングし、将来の人間のモーションを予測します。しかし、これらの方法は、時空間における人間の動きの重要な特性を反映する、関節間の空間的コヒーレンスの存在と骨格間の時間的進化を考慮していませんでした。この目的のために、我々は、時空間におけるスケルトンジョイント共同注意機能マップ上で、スケルトンジョイント共同注意リカレントニューラルネットワーク（SC-RNN）を提案し、ジョイント間の空間的コヒーレンスとスケルトン間の時間的進化を同時にキャプチャします。。最初に、観測されたモーションシーケンスの表現としてスケルトンジョイントフィーチャマップが構築されます。次に、新しいスケルトンジョイント共同注意（SCA）メカニズムを設計して、このスケルトンジョイント機能マップのスケルトンジョイント共同注意機能マップを動的に学習します。これにより、有用な観測モーション情報を洗練して、1つの将来のモーションを予測できます。第三に、SCAが埋め込まれたGRUのバリアントは、スケルトンとジョイントの共同注意機能マップをモーションコンテキストと見なすことにより、時空空間における人間とスケルトンのモーションとヒトのジョイントモーションを協調的にモデル化します。人間の動き予測に関する実験結果は、提案された方法が関連する方法よりも優れていることを示しています。

Human motion prediction aims to generate future motions based on the observed human motions. Witnessing the success of Recurrent Neural Networks (RNN) in modeling the sequential data, recent works utilize RNN to model human-skeleton motion on the observed motion sequence and predict future human motions. However, these methods did not consider the existence of the spatial coherence among joints and the temporal evolution among skeletons, which reflects the crucial characteristics of human motion in spatiotemporal space. To this end, we propose a novel Skeleton-joint Co-attention Recurrent Neural Networks (SC-RNN) to capture the spatial coherence among joints, and the temporal evolution among skeletons simultaneously on a skeleton-joint co-attention feature map in spatiotemporal space. First, a skeleton-joint feature map is constructed as the representation of the observed motion sequence. Second, we design a new Skeleton-joint Co-Attention (SCA) mechanism to dynamically learn a skeleton-joint co-attention feature map of this skeleton-joint feature map, which can refine the useful observed motion information to predict one future motion. Third, a variant of GRU embedded with SCA collaboratively models the human-skeleton motion and human-joint motion in spatiotemporal space by regarding the skeleton-joint co-attention feature map as the motion context. Experimental results on human motion prediction demonstrate the proposed method outperforms the related methods.

updated: Tue Oct 01 2019 15:30:51 GMT+0000 (UTC)

published: Sun Sep 29 2019 09:50:24 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト