Pose estimation is a vital step in many robotics and perception tasks such as robotic manipulation, autonomous vehicle navigation, etc. Current state-of-the-art pose estimation methods rely on deep neural networks with complicated structures and long inference times. While highly robust, they require computing power often unavailable on mobile robots. We propose a CNN-based pose refinement system which takes a coarsely estimated 3D pose from a computationally cheaper algorithm along with a bounding box image of the object, and returns a highly refined pose. Our experiments on the YCB-Video dataset show that our system can refine 3D poses to an extremely high precision with minimal training data.
updated: Sun Nov 17 2019 21:40:05 GMT+0000 (UTC)
published: Sun Nov 17 2019 21:40:05 GMT+0000 (UTC)