Evaluation of Interpretability Methods and Perturbation Artifacts in Deep Neural Networks

Lennart Brocki; Neo Christopher Chung

深層ニューラルネットワークにおける解釈可能性手法と摂動アーティファクトの評価

画像の分類、検出、および予測におけるディープニューラルネットワーク (DNN) の優れたパフォーマンスにもかかわらず、DNN が特定の決定を下す方法を特徴づけることは未解決の問題のままであり、その結果、多くの解釈方法が生まれます。事後解釈可能性メソッドは、主に、クラス確率に関して入力機能の重要性を定量化することを目的としています。ただし、グラウンドトゥルースが不足しており、さまざまな動作特性を持つ解釈可能性メソッドが存在するため、これらのメソッドを評価することは重要な課題です。解釈可能性の方法を評価するための一般的なアプローチは、特定の予測にとって重要と見なされる入力特徴を摂動させ、精度の低下を観察することです。ただし、摂動画像は分布外 (OOD) になる可能性があるため、摂動自体によってアーティファクトが発生する可能性があります。この論文では、摂動アーティファクトの寄与を推定するための計算実験を実施し、解釈可能性の方法の忠実度を推定する方法を開発しました。摂動アーティファクトは実際に存在しますが、Most Import First (MIF) および Least Import First (LIF) オーダーに従って摂動入力フィーチャからのモデル精度曲線を利用することにより、忠実度推定への影響を最小限に抑えて特徴付けることができることを示します。 ImageNet でトレーニングされた ResNet-50 を使用して、4 つの一般的な事後解釈可能性手法の提案された忠実度推定を示します。

Despite excellent performance of deep neural networks (DNNs) in image classification, detection, and prediction, characterizing how DNNs make a given decision remains an open problem, resulting in a number of interpretability methods. Post-hoc interpretability methods primarily aim to quantify the importance of input features with respect to the class probabilities. However, due to the lack of ground truth and the existence of interpretability methods with diverse operating characteristics, evaluating these methods is a crucial challenge. A popular approach to evaluate interpretability methods is to perturb input features deemed important for a given prediction and observe the decrease in accuracy. However, perturbation itself may introduce artifacts, since perturbed images may be out-of-distribution (OOD). In this paper, we have conducted computational experiments to estimate the contribution of perturbation artifacts and developed a method to estimate the fidelity of interpretability methods. We demonstrate that, while perturbation artifacts indeed exist, we can minimize and characterize their impact on fidelity estimation by utilizing model accuracy curves from perturbing input features according to the Most Import First (MIF) and Least Import First (LIF) orders. Using the ResNet-50 trained on the ImageNet, we demonstrate the proposed fidelity estimation of four popular post-hoc interpretability methods.

updated: Mon Mar 06 2023 15:26:24 GMT+0000 (UTC)

published: Sun Mar 06 2022 10:14:09 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト