MEBOW: Monocular Estimation of Body Orientation In the Wild

Chenyan Wu; Yukun Chen; Jiajia Luo; Che-Chun Su; Anuja Dawane; Bikramjot Hanzra; Zhuo Deng; Bilan Liu; James Wang; Cheng-Hao Kuo

MEBOW：野生の体の向きの単眼推定

体の向きの推定は、ロボット工学や自動運転など、多くのアプリケーションで重要な視覚的手がかりを提供します。画像の解像度が低い、オクルージョン、または身体の一部が区別できないために3Dポーズ推定を推測するのが難しい場合は、特に望ましいです。 COCO-MEBOW（野生の体の向きの単眼推定）を紹介します。これは、単一の野生の画像から向きを推定するための新しい大規模データセットです。 COCOデータセットからの55K画像内の約13万人の人体の体の向きのラベルは、効率的で高精度の注釈パイプラインを使用して収集されています。また、データセットの利点を検証しました。まず、データセットが人体の向きの推定モデルのパフォーマンスと堅牢性を大幅に向上させることができることを示します。このモデルの開発は、以前は利用可能なトレーニングデータの規模と多様性によって制限されていました。さらに、3Dポーズラベル、2Dポーズラベル、および体の向きラベルがすべて共同トレーニングで使用される、3D人間のポーズ推定のための新しいトリプルソースソリューションを提示します。私たちのモデルは、トレーニングで3Dポーズラベルと2Dポーズラベルのみを使用する、単眼の3D人間ポーズ推定用の最先端のデュアルソースソリューションを大幅に上回っています。これは、3D人間のポーズ推定に対するMEBOWの重要な利点を実証します。これは、体の向きのインスタンスごとのラベル付けコストが3Dポーズの場合よりもはるかに低いため、特に魅力的です。この作品は、人間の行動を理解することを含む現実世界の課題に取り組む上でのMEBOWの高い可能性を示しています。この作業の詳細については、https：//chenyanwu.github.io/MEBOW/を参照してください。

Body orientation estimation provides crucial visual cues in many applications, including robotics and autonomous driving. It is particularly desirable when 3-D pose estimation is difficult to infer due to poor image resolution, occlusion or indistinguishable body parts. We present COCO-MEBOW (Monocular Estimation of Body Orientation in the Wild), a new large-scale dataset for orientation estimation from a single in-the-wild image. The body-orientation labels for around 130K human bodies within 55K images from the COCO dataset have been collected using an efficient and high-precision annotation pipeline. We also validated the benefits of the dataset. First, we show that our dataset can substantially improve the performance and the robustness of a human body orientation estimation model, the development of which was previously limited by the scale and diversity of the available training data. Additionally, we present a novel triple-source solution for 3-D human pose estimation, where 3-D pose labels, 2-D pose labels, and our body-orientation labels are all used in joint training. Our model significantly outperforms state-of-the-art dual-source solutions for monocular 3-D human pose estimation, where training only uses 3-D pose labels and 2-D pose labels. This substantiates an important advantage of MEBOW for 3-D human pose estimation, which is particularly appealing because the per-instance labeling cost for body orientations is far less than that for 3-D poses. The work demonstrates high potential of MEBOW in addressing real-world challenges involving understanding human behaviors. Further information of this work is available at https://chenyanwu.github.io/MEBOW/.

updated: Fri Nov 27 2020 11:56:13 GMT+0000 (UTC)

published: Fri Nov 27 2020 11:56:13 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト