Head pose estimation, which computes the intrinsic Euler angles (yaw, pitch, roll) from the human, is crucial for gaze estimation, face alignment, and 3D reconstruction. Traditional approaches heavily relies on the accuracy of facial landmarks. It limits their performances, especially when the visibility of the face is not in good condition. In this paper, to do the estimation without facial landmarks, we combine the coarse and fine regression output together for a deep network. Utilizing more quantization units for the angles, a fine classifier is trained with the help of other auxiliary coarse units. Integrating regression is adopted to get the final prediction. The proposed approach is evaluated on three challenging benchmarks. It achieves the state-of-the-art on AFLW2000, BIWI and performs favorably on AFLW. The code has been released on Github.
updated: Wed Oct 02 2019 23:25:54 GMT+0000 (UTC)
published: Mon Jan 21 2019 03:07:05 GMT+0000 (UTC)