Rethinking Keypoint Representations: Modeling Keypoints and Poses as Objects for Multi-Person Human Pose Estimation

William McNally; Kanav Vats; Alexander Wong; John McPhee

キーポイント表現の再考：複数人の人間のポーズ推定のためのオブジェクトとしてのキーポイントとポーズのモデリング

人間の姿勢推定などのキーポイント推定タスクでは、ヒートマップベースの回帰が主なアプローチですが、顕著な欠点があります。ヒートマップは本質的に量子化誤差に悩まされ、生成と後処理に過剰な計算が必要になります。より効率的なソリューションを見つけることを目的として、個々のキーポイントと空間的に関連するキーポイントのセット（つまりポーズ）を高密度の単一ステージアンカーベースの検出フレームワーク内のオブジェクトとしてモデル化する、ヒートマップのない新しいキーポイント推定方法を提案します。したがって、キーポイントとオブジェクトとしてのポーズに対して、メソッドKAPAO（「Ka-Pow！」と発音）を呼び出します。 KAPAOは、人間のポーズオブジェクトとキーポイントオブジェクトを同時に検出し、検出を融合して両方のオブジェクト表現の長所を活用することにより、単一ステージの複数人の人間のポーズ推定の問題に適用します。実験では、KAPAOは、ヒートマップの後処理の影響を大きく受ける以前の方法よりも大幅に高速で正確であることがわかりました。さらに、精度と速度のトレードオフは、テスト時間の拡張を使用しない場合の実際の設定で特に有利です。私たちの大きなモデルであるKAPAO-Lは、Microsoft COCO Keypoints検証セットでテスト時間の拡張なしで70.6のAPを達成します。これは、次善のシングルステージモデルよりも2.5倍高速で、4.0AP精度が高くなります。さらに、KAPAOは重い咬合の存在下で優れています。 CrowdPoseテストセットでは、KAPAO-Lは、APが68.9のシングルステージメソッドで新しい最先端の精度を実現します。

In keypoint estimation tasks such as human pose estimation, heatmap-based regression is the dominant approach despite possessing notable drawbacks: heatmaps intrinsically suffer from quantization error and require excessive computation to generate and post-process. Motivated to find a more efficient solution, we propose a new heatmap-free keypoint estimation method in which individual keypoints and sets of spatially related keypoints (i.e., poses) are modeled as objects within a dense single-stage anchor-based detection framework. Hence, we call our method KAPAO (pronounced "Ka-Pow!") for Keypoints And Poses As Objects. We apply KAPAO to the problem of single-stage multi-person human pose estimation by simultaneously detecting human pose objects and keypoint objects and fusing the detections to exploit the strengths of both object representations. In experiments, we observe that KAPAO is significantly faster and more accurate than previous methods, which suffer greatly from heatmap post-processing. Moreover, the accuracy-speed trade-off is especially favourable in the practical setting when not using test-time augmentation. Our large model, KAPAO-L, achieves an AP of 70.6 on the Microsoft COCO Keypoints validation set without test-time augmentation, which is 2.5x faster and 4.0 AP more accurate than the next best single-stage model. Furthermore, KAPAO excels in the presence of heavy occlusion. On the CrowdPose test set, KAPAO-L achieves new state-of-the-art accuracy for a single-stage method with an AP of 68.9.

updated: Tue Nov 16 2021 15:36:44 GMT+0000 (UTC)

published: Tue Nov 16 2021 15:36:44 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト