On The Coherence of Quantitative Evaluation of Visual Explanations

Benjamin Vandersmissen; Jose Oramas

視覚的説明の定量的評価の一貫性について

近年、視覚的な説明を通じてニューラルネットワークの予測を正当化する方法の開発が進んでいます。これらの説明は通常、ピクセルがラベルの予測にどの程度関連しているかを表す、入力画像の各ピクセルに顕著性 (または関連性) 値を割り当てるヒートマップの形式をとります。この開発を補完するために、そのような説明の「良さ」を評価するための評価方法が提案されています。一方では、これらの方法のいくつかは合成データセットに依存しています。ただし、これにより、より現実的な設定での適用性に関する保証が限られているという弱点が生じます。一方で、客観的な評価をメトリクスに頼る方法もあります。ただし、これらの評価方法のいくつかが相互にどの程度実行されるかは不明です。これを考慮して、ImageNet-1k 検証セットのサブセットに関する包括的な調査を実施し、一連の評価方法に従って、一般的に使用されるさまざまな説明方法を評価します。評価方法の信頼性と説明の特性が評価方法に与える影響を調査する手段として、調査した評価方法のサニティチェックで調査を補完します。私たちの研究結果は、考慮された評価方法のいくつかによって提供される等級付けに一貫性が欠けていることを示唆しています。さらに、パフォーマンスに大きな影響を与える可能性のある説明のいくつかの特性、たとえばスパース性を特定しました。

Recent years have shown an increased development of methods for justifying the predictions of neural networks through visual explanations. These explanations usually take the form of heatmaps which assign a saliency (or relevance) value to each pixel of the input image that expresses how relevant the pixel is for the prediction of a label. Complementing this development, evaluation methods have been proposed to assess the "goodness" of such explanations. On the one hand, some of these methods rely on synthetic datasets. However, this introduces the weakness of having limited guarantees regarding their applicability on more realistic settings. On the other hand, some methods rely on metrics for objective evaluation. However the level to which some of these evaluation methods perform with respect to each other is uncertain. Taking this into account, we conduct a comprehensive study on a subset of the ImageNet-1k validation set where we evaluate a number of different commonly-used explanation methods following a set of evaluation methods. We complement our study with sanity checks on the studied evaluation methods as a means to investigate their reliability and the impact of characteristics of the explanations on the evaluation methods. Results of our study suggest that there is a lack of coherency on the grading provided by some of the considered evaluation methods. Moreover, we have identified some characteristics of the explanations, e.g. sparsity, which can have a significant effect on the performance.

updated: Fri Mar 03 2023 14:36:30 GMT+0000 (UTC)

published: Tue Feb 14 2023 13:41:57 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト