arXiv reaDer
VOICE: Variance of Induced Contrastive Explanations to quantify Uncertainty in Neural Network Interpretability
In this paper, we visualize and quantify the predictive uncertainty of gradient-based post hoc visual explanations for neural networks. Predictive uncertainty refers to the variability in the network predictions under perturbations to the input. Visual post hoc explainability techniques highlight features within an image to justify a network's prediction. We theoretically show that existing evaluation strategies of visual explanatory techniques partially reduce the predictive uncertainty of neural networks. This analysis allows us to construct a plug in approach to visualize and quantify the remaining predictive uncertainty of any gradient-based explanatory technique. We show that every image, network, prediction, and explanatory technique has a unique uncertainty. The proposed uncertainty visualization and quantification yields two key observations. Firstly, oftentimes under incorrect predictions, explanatory techniques are uncertain about the same features that they are attributing the predictions to, thereby reducing the trustworthiness of the explanation. Secondly, objective metrics of an explanation's uncertainty, empirically behave similarly to epistemic uncertainty. We support these observations on two datasets, four explanatory techniques, and six neural network architectures. The code is available at
updated: Sat Jun 01 2024 23:32:29 GMT+0000 (UTC)
published: Sat Jun 01 2024 23:32:29 GMT+0000 (UTC)
参考文献 (このサイトで利用可能なもの) / References (only if available on this site)
被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)アソシエイト