Detecting human in a crowd is a challenging problem due to the uncertainties of occlusion patterns. In this paper, we propose to handle the crowd occlusion problem in human detection by leveraging the head part. Double Anchor RPN is developed to capture body and head parts in pairs. A proposal crossover strategy is introduced to generate high-quality proposals for both parts as a training augmentation. Features of coupled proposals are then aggregated efficiently to exploit the inherent relationship. Finally, a Joint NMS module is developed for robust post-processing. The proposed framework, called Double Anchor R-CNN, is able to detect the body and head for each person simultaneously in crowded scenarios. State-of-the-art results are reported on challenging human detection datasets. Our model yields log-average miss rates (MR) of 51.79pp on CrowdHuman, 55.01pp on COCOPersons~(crowded sub-dataset) and 40.02pp on CrowdPose~(crowded sub-dataset), which outperforms previous baseline detectors by 3.57pp, 3.82pp, and 4.24pp, respectively. We hope our simple and effective approach will serve as a solid baseline and help ease future research in crowded human detection.
updated: Sun Sep 22 2019 13:01:08 GMT+0000 (UTC)
published: Sun Sep 22 2019 13:01:08 GMT+0000 (UTC)