Monocular depth estimation aims at predicting depth from a single image or video. Recently, self-supervised methods draw much attention, due to their free of depth annotations and impressive performance on several daytime benchmarks, such as KITTI and Cityscapes. However, they produce weird outputs in more challenging nighttime scenarios because of low visibility and varying illuminations, which bring weak textures and break brightness-consistency assumption, respectively. To address these problems, in this paper we propose a novel framework with several improvements: (1) we introduce Priors-Based Regularization to learn distribution knowledge from unpaired depth maps and prevent model from being incorrectly trained; (2) we leverage Mapping-Consistent Image Enhancement module to enhance image visibility and contrast while maintaining brightness consistency; and (3) we present Statistics-Based Mask strategy to tune the number of removed pixels within textureless regions, using dynamic statistics. Experimental results demonstrate the effectiveness of each component. Meanwhile, our framework achieves remarkable improvements and state-of-the-art results on two nighttime datasets.
updated: Mon Aug 09 2021 06:24:35 GMT+0000 (UTC)
published: Mon Aug 09 2021 06:24:35 GMT+0000 (UTC)