Human Motion Prediction via Spatio-Temporal Inpainting

Alejandro Hernandez Ruiz; Juergen Gall; Francesc Moreno-Noguer

時空間修復による人間の動き予測

過去の3Dスケルトンポーズのシーケンスを与えられた3Dの人間の動きを予測するために、Generative Adversarial Network（GAN）を提案します。最近のGANは有望な結果を示していますが、比較的短い期間（数百ミリ秒）でしか妥当な動きを予測できず、通常、スケルトンw.r.tの絶対位置を無視します。カメラ。私たちのスキームは、体の姿勢とその絶対位置の両方の長期予測（2秒以上）を提供します。私たちのアプローチは、3つの主要な貢献に基づいています。最初に、3Dスケルトン座標の時空間テンソルを使用してデータを表現します。これにより、予測問題を修復問題として定式化できます。GANは特に効果的です。次に、身体のポーズとグローバルモーションの共同分布を学習するためのアーキテクチャを設計します。これにより、入力データのない入力3Dテンソルの大きなチャンクを仮定できます。そして最後に、ほとんどのアプローチでこれまで検討されてきたL2メトリックは、長期的な人間の動きの実際の分布を把握できないと主張します。周波数の分布に基づいて、より現実的なモーションパターンをキャプチャできる2つの代替メトリックを提案します。大規模な実験により、オクルージョン、ノイズ、および欠落フレームによって過去の観測が破損する状況を処理しながら、最先端技術を大幅に改善するためのアプローチが実証されています。

We propose a Generative Adversarial Network (GAN) to forecast 3D human motion given a sequence of past 3D skeleton poses. While recent GANs have shown promising results, they can only forecast plausible motion over relatively short periods of time (few hundred milliseconds) and typically ignore the absolute position of the skeleton w.r.t. the camera. Our scheme provides long term predictions (two seconds or more) for both the body pose and its absolute position. Our approach builds upon three main contributions. First, we represent the data using a spatio-temporal tensor of 3D skeleton coordinates which allows formulating the prediction problem as an inpainting one, for which GANs work particularly well. Secondly, we design an architecture to learn the joint distribution of body poses and global motion, capable to hypothesize large chunks of the input 3D tensor with missing data. And finally, we argue that the L2 metric, considered so far by most approaches, fails to capture the actual distribution of long-term human motion. We propose two alternative metrics, based on the distribution of frequencies, that are able to capture more realistic motion patterns. Extensive experiments demonstrate our approach to significantly improve the state of the art, while also handling situations in which past observations are corrupted by occlusions, noise and missing frames.

updated: Tue Oct 29 2019 03:41:27 GMT+0000 (UTC)

published: Thu Dec 13 2018 15:27:15 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト