Improving Deep Learning Interpretability by Saliency Guided Training

Aya Abdelsalam Ismail; Héctor Corrada Bravo; Soheil Feizi

顕著性ガイド付きトレーニングによる深層学習の解釈可能性の改善

顕著性手法は、モデル予測における重要な入力機能を強調するために広く使用されています。ほとんどの既存の方法は、修正された勾配関数でバックプロパゲーションを使用して顕著性マップを生成します。したがって、ノイズの多い勾配は、不忠実な機能の帰属をもたらす可能性があります。この論文では、この問題に取り組み、モデルの予測性能を維持しながら、予測で使用されるノイズの多い勾配を減らすためのニューラルネットワークの顕著性ガイド付きトレーニング手順を紹介します。私たちの顕著性ガイド付きトレーニング手順は、マスクされた入力とマスクされていない入力の両方のモデル出力の類似性を最大化しながら、小さくて潜在的にノイズの多い勾配で特徴を繰り返しマスクします。顕著性ガイド付きトレーニング手順を、コンピュータービジョン、自然言語処理、リカレントニューラルネットワーク、畳み込みネットワーク、トランスフォーマーなどのさまざまなニューラルアーキテクチャにわたる時系列からのさまざまな合成データセットと実際のデータセットに適用します。定性的および定量的評価を通じて、顕著性に基づくトレーニング手順により、予測パフォーマンスを維持しながら、さまざまなドメインにわたるモデルの解釈可能性が大幅に向上することを示します。

Saliency methods have been widely used to highlight important input features in model predictions. Most existing methods use backpropagation on a modified gradient function to generate saliency maps. Thus, noisy gradients can result in unfaithful feature attributions. In this paper, we tackle this issue and introduce a saliency guided trainingprocedure for neural networks to reduce noisy gradients used in predictions while retaining the predictive performance of the model. Our saliency guided training procedure iteratively masks features with small and potentially noisy gradients while maximizing the similarity of model outputs for both masked and unmasked inputs. We apply the saliency guided training procedure to various synthetic and real data sets from computer vision, natural language processing, and time series across diverse neural architectures, including Recurrent Neural Networks, Convolutional Networks, and Transformers. Through qualitative and quantitative evaluations, we show that saliency guided training procedure significantly improves model interpretability across various domains while preserving its predictive performance.

updated: Mon Nov 29 2021 06:05:23 GMT+0000 (UTC)

published: Mon Nov 29 2021 06:05:23 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト