YOLO-Pose: Enhancing YOLO for Multi Person Pose Estimation Using Object Keypoint Similarity Loss

Debapriya Maji; Soyeb Nagori; Manu Mathew; Deepak Poddar

YOLO-Pose：オブジェクトのキーポイント類似性損失を使用した複数人のポーズ推定のためのYOLOの拡張

ジョイント検出のための新しいヒートマップフリーアプローチであるYOLO-poseと、人気のあるYOLOオブジェクト検出フレームワークに基づく画像内の2D複数人物ポーズ推定を紹介します。既存のヒートマップベースの2段階アプローチは、エンドツーエンドのトレーニングが不可能であり、トレーニングは評価指標の最大化と同等ではない代理L1損失、つまりオブジェクトキーポイント類似性（OKS）に依存するため、最適ではありません。私たちのフレームワークにより、モデルをエンドツーエンドでトレーニングし、OKSメトリック自体を最適化することができます。提案されたモデルは、1回のフォワードパスで複数の人物のバウンディングボックスとそれに対応する2Dポーズを共同で検出することを学習し、トップダウンとボトムアップの両方のアプローチの最良のものをもたらします。提案されたアプローチでは、検出されたキーポイントをスケルトンにグループ化するためのボトムアップアプローチの後処理は必要ありません。各バウンディングボックスには関連付けられたポーズがあり、キーポイントの固有のグループ化が行われます。トップダウンアプローチとは異なり、すべての人が1つの推論でポーズとともにローカライズされるため、複数のフォワードパスが排除されます。 YOLO-poseは、COCO検証（90.2％AP50）とtest-devセット（90.3％AP50）で新しい最先端の結果を達成し、フリップテストなしの単一のフォワードパスで既存のすべてのボトムアップアプローチを上回ります。スケールテスト、またはその他のテスト時間の延長。このホワイトペーパーで報告されているすべての実験と結果は、パフォーマンスを向上させるためにフリップテストとマルチスケールテストを使用する従来のアプローチとは異なり、テスト時間の延長がありません。トレーニングコードは、https：//github.com/TexasInstruments/edgeai-yolov5およびhttps://github.com/TexasInstruments/edgeai-yoloxで公開されます。

We introduce YOLO-pose, a novel heatmap-free approach for joint detection, and 2D multi-person pose estimation in an image based on the popular YOLO object detection framework. Existing heatmap based two-stage approaches are sub-optimal as they are not end-to-end trainable and training relies on a surrogate L1 loss that is not equivalent to maximizing the evaluation metric, i.e. Object Keypoint Similarity (OKS). Our framework allows us to train the model end-to-end and optimize the OKS metric itself. The proposed model learns to jointly detect bounding boxes for multiple persons and their corresponding 2D poses in a single forward pass and thus bringing in the best of both top-down and bottom-up approaches. Proposed approach doesn't require the postprocessing of bottom-up approaches to group detected keypoints into a skeleton as each bounding box has an associated pose, resulting in an inherent grouping of the keypoints. Unlike top-down approaches, multiple forward passes are done away with since all persons are localized along with their pose in a single inference. YOLO-pose achieves new state-of-the-art results on COCO validation (90.2% AP50) and test-dev set (90.3% AP50), surpassing all existing bottom-up approaches in a single forward pass without flip test, multi-scale testing, or any other test time augmentation. All experiments and results reported in this paper are without any test time augmentation, unlike traditional approaches that use flip-test and multi-scale testing to boost performance. Our training codes will be made publicly available at https://github.com/TexasInstruments/edgeai-yolov5 and https://github.com/TexasInstruments/edgeai-yolox

updated: Thu Apr 14 2022 08:02:40 GMT+0000 (UTC)

published: Thu Apr 14 2022 08:02:40 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト