Recovering 3D human pose from 2D joints is still a challenging problem, especially without any 3D annotation, video information, or multi-view information. In this paper, we present an unsupervised GAN-based model consisting of multiple weight-sharing generators to estimate a 3D human pose from a single image without 3D annotations. In our model, we introduce single-view-multi-angle consistency (SVMAC) to significantly improve the estimation performance. With 2D joint locations as input, our model estimates a 3D pose and a camera simultaneously. During training, the estimated 3D pose is rotated by random angles and the estimated camera projects the rotated 3D poses back to 2D. The 2D reprojections will be fed into weight-sharing generators to estimate the corresponding 3D poses and cameras, which are then mixed to impose SVMAC constraints to self-supervise the training process. The experimental results show that our method outperforms the state-of-the-art unsupervised methods by 2.6% on Human 3.6M and 15.0% on MPI-INF-3DHP. Moreover, qualitative results on MPII and LSP show that our method can generalize well to unknown data.
updated: Sun Aug 08 2021 02:00:58 GMT+0000 (UTC)
published: Thu Jun 10 2021 09:43:57 GMT+0000 (UTC)