Non-local Graph Convolutional Network for joint Activity Recognition and Motion Prediction

Dianhao Zhang; Ngo Anh Vien; Mien Van; Sean McLoone

共同活動認識と運動予測のための非局所グラフ畳み込みネットワーク

3Dスケルトンベースのモーション予測とアクティビティ認識は、人間の行動分析における2つの織り交ぜられたタスクです。この作業では、人間の共同運動予測と活動認識のために、グラフ畳み込みニューラルネットワークとリカレントニューラルネットワークの両方の利点を組み合わせる新しい方法を提供するモーションコンテキストモデリング方法論を提案します。私たちのアプローチは、LSTMエンコーダーデコーダーと非局所特徴抽出注意メカニズムを使用して、人間の骨格データの空間相関とモーションフレーム間の時間相関をモデル化することに基づいています。提案されたネットワークには、2つの出力ブランチを簡単に含めることができます。1つはアクティビティ認識用、もう1つは将来のモーション予測用で、パフォーマンスを向上させるために共同でトレーニングできます。 Human 3.6M、CMU Mocap、およびNTU RGB-Dデータセットの実験結果は、提案されたアプローチが、他の最先端の方法と同等のパフォーマンスを達成しながら、ベースラインLSTMベースの方法の中で最高の予測機能を提供することを示しています。

3D skeleton-based motion prediction and activity recognition are two interwoven tasks in human behaviour analysis. In this work, we propose a motion context modeling methodology that provides a new way to combine the advantages of both graph convolutional neural networks and recurrent neural networks for joint human motion prediction and activity recognition. Our approach is based on using an LSTM encoder-decoder and a non-local feature extraction attention mechanism to model the spatial correlation of human skeleton data and temporal correlation among motion frames. The proposed network can easily include two output branches, one for Activity Recognition and one for Future Motion Prediction, which can be jointly trained for enhanced performance. Experimental results on Human 3.6M, CMU Mocap and NTU RGB-D datasets show that our proposed approach provides the best prediction capability among baseline LSTM-based methods, while achieving comparable performance to other state-of-the-art methods.

updated: Tue Aug 03 2021 14:07:10 GMT+0000 (UTC)

published: Tue Aug 03 2021 14:07:10 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト