Learning Constrained Dynamic Correlations in Spatiotemporal Graphs for Motion Prediction

Jiajun Fu; Fuxing Yang; Xiaoli Liu; Jianqin Yin

運動予測のための時空間グラフにおける制約された動的相関の学習

人間の動きの予測は、さまざまな動きのシーケンスにおける動的な時空間相関のため、困難な作業です。異なるモーションシーケンス間の時空間相関を効率的に表現し、動的相関分散をモデル化する方法は、モーション予測における時空間表現の課題です。この作業では、動的空間グラフ畳み込み（DS-GC）と動的時間グラフ畳み込み（DT-GC）の組み合わせで動的時空間グラフモデリングを分解する動的時空間分解グラフ畳み込み（DSTD-GC）を提案します。 DS-GC / DT-GCの動的な空間/時間相関は、制約付き動的相関モデリングによって効率的に表されます。これは、身体の接続やさまざまなサンプルの動的パターンなど、人間の動きに共通する制約に触発されています。制約付き動的相関モデリングは、共有の空間/時間相関と非共有の相関抽出関数の組み合わせとして、空間/時間グラフを表します。この時空間表現は正方形の空間の複雑さであり、最先端のサンプル共有分解表現の28.6％のパラメーターのみを必要とします。また、サンプル固有の時空間相関分散を明示的にモデル化します。さらに、時空間グラフのグラフ畳み込みを数学的に統一された形式に再定式化し、DSTD-GCが他のグラフ畳み込みの特定の制約を緩和し、より強力な表現機能につながることを発見しました。 DSTD-GCを、身体のつながりや時間的コンテキストなどの事前知識と組み合わせて、DSTD-GCNと呼ばれる強力な時空間グラフ畳み込みネットワークを提案します。 Human3.6MおよびCMUMocapデータセットでは、DSTD-GCNは、55.0％〜96.9％のパラメーター削減で、予測精度が3.9％〜5.7％優れています。

Human motion prediction is a challenging task due to the dynamic spatiotemporal correlations in different motion sequences. How to efficiently represent spatiotemporal correlations and model dynamic correlation variances between different motion sequences is a challenge for spatiotemporal representation in motion prediction. In this work, we propose Dynamic SpatioTemporal Decompose Graph Convolution (DSTD-GC), which decomposes dynamic spatiotemporal graph modeling with a combination of Dynamic Spatial Graph Convolution (DS-GC) and Dynamic Temporal Graph Convolution (DT-GC). The dynamic spatial/temporal correlations in DS-GC/DT-GC are efficiently represented by Constrained Dynamic Correlation Modeling, which is inspired by the common constraints in human motion like body connections and dynamic patterns from different samples. The Constrained Dynamic Correlation Modeling represents the spatial/temporal graph as a combination of a shared spatial/temporal correlation and an unshared correlation extraction function. This spatiotemporal representation is of square space complexity and only requires 28.6% parameters of the state-of-the-art sample-shared decomposition representation. It also explicitly models sample-specific spatiotemporal correlation variances. Moreover, we also mathematically reformulate graph convolutions on spatiotemporal graphs into a unified form and find that DSTD-GC relaxes certain constraints of other graph convolutions, which leads to a stronger representation capability. Combining DSTD-GC with prior knowledge like body connection and temporal context, we propose a powerful spatiotemporal graph convolution network called DSTD-GCN. On the Human3.6M and CMU Mocap datasets, DSTD-GCN outperforms state-of-the-art methods by 3.9% - 5.7% in prediction accuracy with 55.0% - 96.9% parameter reduction.

updated: Thu May 12 2022 03:40:02 GMT+0000 (UTC)

published: Mon Apr 04 2022 08:11:06 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト