Human pose estimation are of importance for visual understanding tasks such as action recognition and human-computer interaction. In this work, we present a Multiple Stage High-Resolution Network (Multi-Stage HRNet) to tackling the problem of multi-person pose estimation in images. Specifically, we follow the top-down pipelines and high-resolution representations are maintained during single-person pose estimation. In addition, multiple stage network and cross stage feature aggregation are adopted to further refine the keypoint position. The resulting approach achieves promising results in COCO datasets. Our single-model-single-scale test configuration obtains 77.1 AP score in test-dev using publicly available training data.
updated: Mon Oct 14 2019 03:08:03 GMT+0000 (UTC)
published: Mon Oct 14 2019 03:08:03 GMT+0000 (UTC)