Dependency Decomposition and a Reject Option for Explainable Models

Jan Kronenberger; Anselm Haselhoff

説明可能なモデルの依存性分解と拒否オプション

安全関連のメイン（自動運転、医療診断など）に機械学習モデルを導入するには、説明可能で、敵対的攻撃に対して堅牢で、モデルの不確実性を認識しているアプローチが必要です。最近の深層学習モデルは、さまざまな推論タスクで非常にうまく機能しますが、これらのアプローチのブラックボックスの性質により、上記の3つの要件に関する弱点が生じます。最近の進歩は、特徴を視覚化し、入力の帰属（egheatmaps）を記述し、テキストによる説明を提供し、次元を削減する方法を提供します。ただし、分類タスクの説明は依存していますか、それとも互いに独立していますか？たとえば、オブジェクトの形状は色に依存しますか？説明を生成するために予測されたクラスを使用すること、およびその逆の効果は何ですか？説明可能な深層学習モデルのコンテキストで、目的の画像分類出力と説明変数（属性、テキスト、ヒートマップなど）の確率分布に関する依存関係の最初の分析を示します。したがって、説明依存性分解（EDD）を実行します。さまざまな依存関係の影響を分析し、説明を生成する2つの方法を提案します。最後に、説明を使用して予測を検証（承認または拒否）します

Deploying machine learning models in safety-related do-mains (e.g. autonomous driving, medical diagnosis) demands for approaches that are explainable, robust against adversarial attacks and aware of the model uncertainty. Recent deep learning models perform extremely well in various inference tasks, but the black-box nature of these approaches leads to a weakness regarding the three requirements mentioned above. Recent advances offer methods to visualize features, describe attribution of the input (e.g.heatmaps), provide textual explanations or reduce dimensionality. However,are explanations for classification tasks dependent or are they independent of each other? For in-stance, is the shape of an object dependent on the color? What is the effect of using the predicted class for generating explanations and vice versa? In the context of explainable deep learning models, we present the first analysis of dependencies regarding the probability distribution over the desired image classification outputs and the explaining variables (e.g. attributes, texts, heatmaps). Therefore, we perform an Explanation Dependency Decomposition (EDD). We analyze the implications of the different dependencies and propose two ways of generating the explanation. Finally, we use the explanation to verify (accept or reject) the prediction

updated: Fri Dec 11 2020 17:39:33 GMT+0000 (UTC)

published: Fri Dec 11 2020 17:39:33 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト