Rethinking interpretation: Input-agnostic saliency mapping of deep visual classifiers

Naveed Akhtar; Mohammad A. A. K. Jalwana

解釈の再考: 深層視覚分類器の入力に依存しない顕著性マッピング

顕著性メソッドは、入力機能をモデル出力に関連付けることにより、事後モデル解釈を提供します。現在の方法では、主に単一の入力サンプルを使用してこれを実現しているため、モデルに関する入力に依存しない問い合わせに答えることができません。また、入力固有の顕著性マッピングは、本質的に誤解を招く特徴属性の影響を受けやすいことも示しています。モデルの解釈に「一般的な」入力機能を使用しようとする現在の試みは、これらの機能を含むデータセットへのアクセスを前提としており、解釈に偏りがあります。ギャップに対処するために、入力に依存しない顕著性マッピングの新しい視点を導入します。これは、モデルによってその出力に起因する高レベルの機能を計算で推定します。これらの特徴は幾何学的に相関しており、無制限のデータ分布に関してモデルの勾配情報を蓄積することによって計算されます。これらの特徴を計算するために、分類器のクラスラベルなど、人間が理解できる概念によって関連付けられた極小値に向かって、モデル損失面上の独立したデータポイントを微調整します。体系的な投影、スケーリング、および改良プロセスにより、この情報は、モデルの忠実度を損なうことなく、解釈可能な視覚化に変換されます。視覚化は、スタンドアロンの定性的な解釈として機能します。広範な評価により、大規模モデルのさまざまな概念の成功した視覚化を実証するだけでなく、侵害された分類子のバックドア署名を特定することにより、この新しい形式の顕著性マッピングの興味深い有用性を示します。

Saliency methods provide post-hoc model interpretation by attributing input features to the model outputs. Current methods mainly achieve this using a single input sample, thereby failing to answer input-independent inquiries about the model. We also show that input-specific saliency mapping is intrinsically susceptible to misleading feature attribution. Current attempts to use 'general' input features for model interpretation assume access to a dataset containing those features, which biases the interpretation. Addressing the gap, we introduce a new perspective of input-agnostic saliency mapping that computationally estimates the high-level features attributed by the model to its outputs. These features are geometrically correlated, and are computed by accumulating model's gradient information with respect to an unrestricted data distribution. To compute these features, we nudge independent data points over the model loss surface towards the local minima associated by a human-understandable concept, e.g., class label for classifiers. With a systematic projection, scaling and refinement process, this information is transformed into an interpretable visualization without compromising its model-fidelity. The visualization serves as a stand-alone qualitative interpretation. With an extensive evaluation, we not only demonstrate successful visualizations for a variety of concepts for large-scale models, but also showcase an interesting utility of this new form of saliency mapping by identifying backdoor signatures in compromised classifiers.

updated: Fri Mar 31 2023 06:58:45 GMT+0000 (UTC)

published: Fri Mar 31 2023 06:58:45 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト