Where do Models go Wrong? Parameter-Space Saliency Maps for Explainability

Roman Levin; Manli Shu; Eitan Borgnia; Furong Huang; Micah Goldblum; Tom Goldstein

モデルはどこで間違っていますか？説明可能性のためのパラメータ-空間顕著性マップ

従来の顕著性マップは、ニューラルネットワークの予測が非常に敏感な入力機能を強調しています。顕著性には別のアプローチを採用しています。このアプローチでは、誤った決定の原因となる入力ではなく、ネットワークパラメータを特定して分析します。同様のパラメーターを誤動作させるサンプルは、意味的に類似していることがわかります。また、誤って分類されたサンプルの最も顕著なパラメーターを枝刈りすると、モデルの動作が改善されることが多いことも示しています。さらに、単一のサンプルで少数の最も顕著なパラメーターを微調整すると、同様の理由で誤分類された他のサンプルでエラーが修正されます。また、パラメータ顕著性手法に基づいて、画像の特徴が特定のネットワークコンポーネントの誤動作を引き起こす方法を明らかにする入力空間顕著性手法も紹介します。さらに、データセットレベルとケーススタディレベルの両方で、顕著性マップの意味を厳密に検証します。

Conventional saliency maps highlight input features to which neural network predictions are highly sensitive. We take a different approach to saliency, in which we identify and analyze the network parameters, rather than inputs, which are responsible for erroneous decisions. We find that samples which cause similar parameters to malfunction are semantically similar. We also show that pruning the most salient parameters for a wrongly classified sample often improves model behavior. Furthermore, fine-tuning a small number of the most salient parameters on a single sample results in error correction on other samples that are misclassified for similar reasons. Based on our parameter saliency method, we also introduce an input-space saliency technique that reveals how image features cause specific network components to malfunction. Further, we rigorously validate the meaningfulness of our saliency maps on both the dataset and case-study levels.

updated: Tue Aug 03 2021 07:32:34 GMT+0000 (UTC)

published: Tue Aug 03 2021 07:32:34 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト