Holistic 3D Human and Scene Mesh Estimation from Single View Images

Zhenzhen Weng; Serena Yeung

シングルビュー画像からのホリスティック3D人間およびシーンメッシュ推定

3Dの世界は人体のポーズを制限し、人体のポーズは周囲のオブジェクトに関する情報を伝達します。確かに、屋内シーンに配置された人物の単一の画像から、私たち人間は、物理法則の知識ともっともらしいオブジェクトと人間のポーズの事前認識を通じて、人間のポーズと部屋のレイアウトのあいまいさを解決することに長けています。ただし、この事実を完全に活用しているコンピュータビジョンモデルはほとんどありません。この作業では、単一のRGB画像から3Dシーンを認識し、カメラのポーズと部屋のレイアウトを推定し、人体とオブジェクトの両方のメッシュを再構築する、エンドツーエンドのトレーニング可能なモデルを提案します。推定のすべての側面に包括的で洗練された一連の損失を課すことにより、モデルが既存の人体メッシュ法および屋内シーン再構成法よりも優れていることを示します。私たちの知る限り、これはメッシュレベルでオブジェクトと人間の両方の予測を出力し、シーンと人間のポーズで共同最適化を実行する最初のモデルです。

The 3D world limits the human body pose and the human body pose conveys information about the surrounding objects. Indeed, from a single image of a person placed in an indoor scene, we as humans are adept at resolving ambiguities of the human pose and room layout through our knowledge of the physical laws and prior perception of the plausible object and human poses. However, few computer vision models fully leverage this fact. In this work, we propose an end-to-end trainable model that perceives the 3D scene from a single RGB image, estimates the camera pose and the room layout, and reconstructs both human body and object meshes. By imposing a set of comprehensive and sophisticated losses on all aspects of the estimations, we show that our model outperforms existing human body mesh methods and indoor scene reconstruction methods. To the best of our knowledge, this is the first model that outputs both object and human predictions at the mesh level, and performs joint optimization on the scene and human poses.

updated: Fri Apr 16 2021 17:30:41 GMT+0000 (UTC)

published: Wed Dec 02 2020 23:22:03 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト