PandaNet : Anchor-Based Single-Shot Multi-Person 3D Pose Estimation

Abdallah Benzine; Florian Chabot; Bertrand Luvison; Quoc Cong Pham; Cahterine Achrd

PandaNet：アンカーベースのシングルショットマルチパーソン3Dポーズ推定

最近、3D人間の姿勢推定のためにいくつかの深層学習モデルが提案されています。それにもかかわらず、これらのアプローチのほとんどは、1人の場合にのみ焦点を当てるか、高解像度で数人の3Dポーズを推定します。さらに、自動運転や群集分析などの多くのアプリケーションでは、おそらく低解像度で多数の人のポーズを推定する必要があります。この作業では、新しいシングルショット、アンカーベース、および複数人の3Dポーズ推定アプローチであるPandaNet（ポーズ推定および検出アンカーベースのネットワーク）を紹介します。提案されたモデルは、バウンディングボックスの検出を実行し、検出された人物ごとに、2Dおよび3Dポーズ回帰を単一のフォワードパスに実行します。ネットワークは各境界ボックスの完全な3Dポーズを予測し、低解像度で多数の人のポーズを推定できるため、関節を再グループ化するための後処理は必要ありません。重複する人々を管理するために、ポーズを意識したアンカー選択戦略を導入します。さらに、画像内の異なる人のサイズの間に不均衡が存在し、関節の座標はこれらのサイズに応じて異なる不確実性を持っているため、効率的なトレーニングのために異なる人のスケールと関節に関連付けられた重みを自動的に最適化する方法を提案します。 PandaNetは、複数の人がいる都市の仮想であるが非常に現実的なデータセット（JTAデータセット）と2つの現実世界の3D複数人のデータセット（CMU PanopticとMuPoTS-3D）など、いくつかの難しいデータセットで以前のシングルショット手法を上回っています。

Recently, several deep learning models have been proposed for 3D human pose estimation. Nevertheless, most of these approaches only focus on the single-person case or estimate 3D pose of a few people at high resolution. Furthermore, many applications such as autonomous driving or crowd analysis require pose estimation of a large number of people possibly at low-resolution. In this work, we present PandaNet (Pose estimAtioN and Dectection Anchor-based Network), a new single-shot, anchor-based and multi-person 3D pose estimation approach. The proposed model performs bounding box detection and, for each detected person, 2D and 3D pose regression into a single forward pass. It does not need any post-processing to regroup joints since the network predicts a full 3D pose for each bounding box and allows the pose estimation of a possibly large number of people at low resolution. To manage people overlapping, we introduce a Pose-Aware Anchor Selection strategy. Moreover, as imbalance exists between different people sizes in the image, and joints coordinates have different uncertainties depending on these sizes, we propose a method to automatically optimize weights associated to different people scales and joints for efficient training. PandaNet surpasses previous single-shot methods on several challenging datasets: a multi-person urban virtual but very realistic dataset (JTA Dataset), and two real world 3D multi-person datasets (CMU Panoptic and MuPoTS-3D).

updated: Thu Jan 07 2021 10:32:17 GMT+0000 (UTC)

published: Thu Jan 07 2021 10:32:17 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト