GLANCE: Global to Local Architecture-Neutral Concept-based Explanations

Avinash Kori; Ben Glocker; Francesca Toni

GLANCE：グローバルからローカルアーキテクチャ-ニュートラルな概念ベースの説明

現在の説明可能性手法のほとんどは、入力空間の特徴の重要性を捉えることに焦点を合わせています。ただし、モデルとデータ生成プロセスの複雑さを考えると、結果として得られる説明は、機能の相互作用の表示とその「効果」の視覚化が欠けているという点で、「完全」にはほど遠いものです。この作業では、CNNベースの画像分類器（アーキテクチャに関係なく）によって行われた決定を説明するための新しいツインサロゲート説明可能性フレームワークを提案します。このために、最初に潜在的な特徴を分類器から解きほぐし、次にこれらの特徴を観察された/人間が定義した「コンテキスト」特徴に合わせます。これらの整列された機能は、「知覚された」データ生成プロセスを描写する因果グラフを抽出するために使用される意味的に意味のある概念を形成し、観察されない潜在機能と観察された「コンテキスト」機能の間の機能間および機能内の相互作用を記述します。この因果グラフは、さまざまな形式のローカルな説明を抽出できるグローバルモデルとして機能します。具体的には、潜在空間内の特徴間の相互作用の「効果」を視覚化し、そこから局所的な説明として特徴の重要性を引き出すためのジェネレータを提供します。私たちのフレームワークは、敵対的な知識の蒸留を利用して、分類器の潜在空間から表現を忠実に学習し、それを視覚的な説明の抽出に使用します。解きほぐしと整列を強制するために、追加の正則化項を含むstyleGAN-v2アーキテクチャを使用します。 Morpho-MNISTおよびFFHQの人間の顔のデータセットに関するフレームワークで得られた説明を示し、評価します。私たちのフレームワークはhttps://github.com/koriavinash1/GLANCE-Explanationsで入手できます。

Most of the current explainability techniques focus on capturing the importance of features in input space. However, given the complexity of models and data-generating processes, the resulting explanations are far from being `complete', in that they lack an indication of feature interactions and visualization of their `effect'. In this work, we propose a novel twin-surrogate explainability framework to explain the decisions made by any CNN-based image classifier (irrespective of the architecture). For this, we first disentangle latent features from the classifier, followed by aligning these features to observed/human-defined `context' features. These aligned features form semantically meaningful concepts that are used for extracting a causal graph depicting the `perceived' data-generating process, describing the inter- and intra-feature interactions between unobserved latent features and observed `context' features. This causal graph serves as a global model from which local explanations of different forms can be extracted. Specifically, we provide a generator to visualize the `effect' of interactions among features in latent space and draw feature importance therefrom as local explanations. Our framework utilizes adversarial knowledge distillation to faithfully learn a representation from the classifiers' latent space and use it for extracting visual explanations. We use the styleGAN-v2 architecture with an additional regularization term to enforce disentanglement and alignment. We demonstrate and evaluate explanations obtained with our framework on Morpho-MNIST and on the FFHQ human faces dataset. Our framework is available at https://github.com/koriavinash1/GLANCE-Explanations.

updated: Tue Jul 05 2022 09:52:09 GMT+0000 (UTC)

published: Tue Jul 05 2022 09:52:09 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト