In this paper, we consider adversarial attacks against a system of monocular depth estimation (MDE) based on convolutional neural networks (CNNs). The motivation is two-fold. One is to study the security of MDE systems, which has not been actively considered in the community. The other is to improve our understanding of the computational mechanism of CNNs performing MDE. Toward this end, we apply the method recently proposed for visualization of MDE to defending attacks. It trains another CNN to predict a saliency map from an input image, such that the CNN for MDE continues to accurately estimate the depth map from the image with its non-salient part masked out. We report the following findings. First, unsurprisingly, attacks by IFGSM (or equivalently PGD) succeed in making the CNNs yield inaccurate depth estimates. Second, the attacks can be defended by masking out non-salient pixels, indicating that the attacks function by perturbing mostly non-salient pixels. However, the prediction of saliency maps is itself vulnerable to the attacks, even though it is not the direct target of the attacks. We show that the attacks can be defended by using a saliency map predicted by a CNN trained to be robust to the attacks. These results provide an effective defense method as well as a clue to understanding the computational mechanism of CNNs for MDE.
updated: Wed Nov 20 2019 09:41:53 GMT+0000 (UTC)
published: Wed Nov 20 2019 09:41:53 GMT+0000 (UTC)