GIMO: Gaze-Informed Human Motion Prediction in Context

Yang Zheng; Yanchao Yang; Kaichun Mo; Jiaman Li; Tao Yu; Yebin Liu; C. Karen Liu; Leonidas J. Guibas

GIMO：状況に応じた視線情報に基づく人間の動きの予測

人間の動きを予測することは、人間との相互作用が安全で快適である必要がある支援ロボットやAR/VRアプリケーションにとって重要です。一方、正確な予測は、シーンのコンテキストと人間の意図の両方を理解することに依存します。多くの作品がシーンを意識した人間の動きの予測を研究していますが、人間の意図を明らかにする自己中心的なビューがなく、動きとシーンの多様性が限られているため、後者はほとんど調査されていません。ギャップを減らすために、高品質の身体ポーズシーケンス、シーンスキャン、および人間の意図を推測するための代理として機能する視線を備えたエゴセントリックビューを提供する大規模な人間の動きのデータセットを提案します。モーションキャプチャに慣性センサーを採用することで、データ収集が特定のシーンに結び付けられなくなり、被写体から観察されるモーションダイナミクスがさらに向上します。さまざまな最先端のアーキテクチャを使用して、自己中心的な人間の動きを予測するために視線を活用することの利点について、広範な研究を行っています。さらに、視線の可能性を最大限に引き出すために、視線とモーションブランチ間の双方向通信を可能にする新しいネットワークアーキテクチャを提案します。私たちのネットワークは、視線からの意図情報と動きによって変調されたノイズ除去された視線機能のおかげで、提案されたデータセットで人間の動きの予測で最高のパフォーマンスを達成します。コードとデータはhttps://github.com/y-zheng18/GIMOにあります。

Predicting human motion is critical for assistive robots and AR/VR applications, where the interaction with humans needs to be safe and comfortable. Meanwhile, an accurate prediction depends on understanding both the scene context and human intentions. Even though many works study scene-aware human motion prediction, the latter is largely underexplored due to the lack of ego-centric views that disclose human intent and the limited diversity in motion and scenes. To reduce the gap, we propose a large-scale human motion dataset that delivers high-quality body pose sequences, scene scans, as well as ego-centric views with the eye gaze that serves as a surrogate for inferring human intent. By employing inertial sensors for motion capture, our data collection is not tied to specific scenes, which further boosts the motion dynamics observed from our subjects. We perform an extensive study of the benefits of leveraging the eye gaze for ego-centric human motion prediction with various state-of-the-art architectures. Moreover, to realize the full potential of the gaze, we propose a novel network architecture that enables bidirectional communication between the gaze and motion branches. Our network achieves the top performance in human motion prediction on the proposed dataset, thanks to the intent information from eye gaze and the denoised gaze feature modulated by the motion. Code and data can be found at https://github.com/y-zheng18/GIMO.

updated: Tue Jul 19 2022 16:01:02 GMT+0000 (UTC)

published: Wed Apr 20 2022 13:17:39 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト