Scene-aware Egocentric 3D Human Pose Estimation

Jian Wang; Lingjie Liu; Weipeng Xu; Kripasindhu Sarkar; Diogo Luvizon; Christian Theobalt

シーンを意識した自己中心的な 3D 人間の姿勢推定

単一のヘッドマウント魚眼カメラを使用した自己中心的な3D人間の姿勢推定は、仮想および拡張現実での多数のアプリケーションにより、最近注目を集めています。既存の方法は、人体が非常に遮られているか、シーンと密接に相互作用している困難なポーズでは依然として苦労しています.この問題に対処するために、シーンの制約を使用して自己中心的なポーズの予測を導く、シーンを意識した自己中心的なポーズ推定方法を提案します。この目的のために、深度修復ネットワークで人体の閉塞を軽減しながら、広視野の自己中心的な魚眼カメラからシーン深度マップを予測する自己中心的な深度推定ネットワークを提案します。次に、2D 画像の特徴とシーンの推定深度マップをボクセル空間に投影し、V2V ネットワークで 3D ポーズを回帰するシーン認識ポーズ推定ネットワークを提案します。ボクセルベースの特徴表現は、2D 画像特徴とシーンジオメトリの間の直接的な幾何学的接続を提供し、V2V ネットワークが推定されたシーンジオメトリに基づいて予測されたポーズを制約することをさらに容易にします。前述のネットワークのトレーニングを可能にするために、EgoGTA と呼ばれる合成データセットと、EgoPW-Scene と呼ばれる EgoPW に基づく野生のデータセットも生成しました。私たちの新しい評価シーケンスの実験結果は、予測された 3D 自己中心的なポーズが人間とシーンの相互作用に関して正確であり、物理的にもっともらしいことを示しており、私たちの方法が最先端の方法よりも定量的および定性的に優れていることを示しています。

Egocentric 3D human pose estimation with a single head-mounted fisheye camera has recently attracted attention due to its numerous applications in virtual and augmented reality. Existing methods still struggle in challenging poses where the human body is highly occluded or is closely interacting with the scene. To address this issue, we propose a scene-aware egocentric pose estimation method that guides the prediction of the egocentric pose with scene constraints. To this end, we propose an egocentric depth estimation network to predict the scene depth map from a wide-view egocentric fisheye camera while mitigating the occlusion of the human body with a depth-inpainting network. Next, we propose a scene-aware pose estimation network that projects the 2D image features and estimated depth map of the scene into a voxel space and regresses the 3D pose with a V2V network. The voxel-based feature representation provides the direct geometric connection between 2D image features and scene geometry, and further facilitates the V2V network to constrain the predicted pose based on the estimated scene geometry. To enable the training of the aforementioned networks, we also generated a synthetic dataset, called EgoGTA, and an in-the-wild dataset based on EgoPW, called EgoPW-Scene. The experimental results of our new evaluation sequences show that the predicted 3D egocentric poses are accurate and physically plausible in terms of human-scene interaction, demonstrating that our method outperforms the state-of-the-art methods both quantitatively and qualitatively.

updated: Tue Dec 20 2022 21:35:39 GMT+0000 (UTC)

published: Tue Dec 20 2022 21:35:39 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト