Bottom-Up 2D Pose Estimation via Dual Anatomical Centers for Small-Scale Persons

Yu Cheng; Yihao Ai; Bo Wang; Xinchao Wang; Robby T. Tan

小規模な人のためのデュアル解剖学的中心によるボトムアップ 2D 姿勢推定

複数人の 2D 姿勢推定では、ボトムアップ法はすべての人物の姿勢を同時に予測し、トップダウン法とは異なり、人間の検出に依存しません。ただし、SOTA のボトムアップ手法の精度は、既存のトップダウン手法に比べてまだ劣っています。これは、一貫性のない人間のバウンディングボックスの中心に基づいて予測された人間のポーズが後退し、人間のスケールの正規化が行われていないためです。これにより、予測された人間のポーズが不正確になり、小規模な人物が見落とされます。ボトムアップの姿勢推定の限界を押し広げるために、特に小規模な人物に対して、単一スケールのテストでスケールの変動を処理できるようにネットワークを強化するためのマルチスケールトレーニングを最初に提案します。次に、二重の解剖学的中心 (頭と体) を導入します。これにより、特に小規模な人物の場合、人間の姿勢をより正確かつ確実に予測できます。さらに、既存のボトムアップ手法はマルチスケールテストを使用して、複数の追加のフォワードパスを犠牲にしてポーズ推定の精度を高めます。これにより、トップダウン手法と比較して、ボトムアップ手法の効率、コアの強度が弱まります。対照的に、マルチスケールトレーニングにより、モデルは 1 回のフォワードパス (つまり、シングルスケールテスト) で高品質のポーズを予測できます。私たちの方法は、COCOの挑戦的な小規模人物サブセットで、最先端技術（SOTA）よりもバウンディングボックスの精度で38.4％の改善、バウンディングボックスのリコールで39.1％の改善を達成しています。ヒューマンポーズの AP 評価では、シングルスケールテストを使用して、COCO テスト開発セットで新しい SOTA (71.0 AP) を達成しました。また、クロスデータセット評価では、OCHuman データセットで最高のパフォーマンス (40.3 AP) を達成しています。

In multi-person 2D pose estimation, the bottom-up methods simultaneously predict poses for all persons, and unlike the top-down methods, do not rely on human detection. However, the SOTA bottom-up methods' accuracy is still inferior compared to the existing top-down methods. This is due to the predicted human poses being regressed based on the inconsistent human bounding box center and the lack of human-scale normalization, leading to the predicted human poses being inaccurate and small-scale persons being missed. To push the envelope of the bottom-up pose estimation, we firstly propose multi-scale training to enhance the network to handle scale variation with single-scale testing, particularly for small-scale persons. Secondly, we introduce dual anatomical centers (i.e., head and body), where we can predict the human poses more accurately and reliably, especially for small-scale persons. Moreover, existing bottom-up methods use multi-scale testing to boost the accuracy of pose estimation at the price of multiple additional forward passes, which weakens the efficiency of bottom-up methods, the core strength compared to top-down methods. By contrast, our multi-scale training enables the model to predict high-quality poses in a single forward pass (i.e., single-scale testing). Our method achieves 38.4% improvement on bounding box precision and 39.1% improvement on bounding box recall over the state of the art (SOTA) on the challenging small-scale persons subset of COCO. For the human pose AP evaluation, we achieve a new SOTA (71.0 AP) on the COCO test-dev set with the single-scale testing. We also achieve the top performance (40.3 AP) on OCHuman dataset in cross-dataset evaluation.

updated: Wed Nov 23 2022 05:03:37 GMT+0000 (UTC)

published: Thu Aug 25 2022 10:09:10 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト