Although automatic gaze estimation is very important to a large variety of application areas, it is difficult to train accurate and robust gaze models, in great part due to the difficulty in collecting large and diverse data (annotating 3D gaze is expensive and existing datasets use different setups). To address this issue, our main contribution in this paper is to propose an effective approach to learn a low dimensional gaze representation without gaze annotations, which to the best of our best knowledge, is the first work to do so. The main idea is to rely on a gaze redirection network and use the gaze representation difference of the input and target images (of the redirection network) as the redirection variable. A redirection loss in image domain allows the joint training of both the redirection network and the gaze representation network. In addition, we propose a warping field regularization which not only provides an explicit physical meaning to the gaze representations but also avoids redirection distortions. Promising results on few-shot gaze estimation (competitive results can be achieved with as few as <= 100 calibration samples), cross-dataset gaze estimation, gaze network pretraining, and another task (head pose estimation) demonstrate the validity of our framework.
updated: Fri Apr 03 2020 14:30:02 GMT+0000 (UTC)
published: Sat Nov 16 2019 02:46:04 GMT+0000 (UTC)