When are Post-hoc Conceptual Explanantions Identifiable?

Tobias Leemann; Michael Kirchhof; Yao Rong; Enkelejda Kasneci; Gjergji Kasneci

事後概念説明が識別可能になるのはいつですか?

概念的な説明を通じて学習された埋め込み空間を理解し、因数分解することへの関心は着実に高まっています。人間の概念ラベルが利用できない場合、概念発見メソッドは、決定の事後説明を提供するために使用できるオブジェクトの形状や色などの解釈可能な概念について、訓練された埋め込みスペースを検索します。以前の研究とは異なり、概念の発見は識別可能であるべきであると主張します。つまり、説明の信頼性を保証するために、多くの既知の概念を証明可能に復元できるということです。出発点として、非ガウス分布で独立した概念を回復できることを示すことにより、概念の発見と主成分分析や独立成分分析などの古典的な方法との間の接続を明示的に行います。依存概念については、画像生成プロセスの機能的構成特性を利用する 2 つの新しいアプローチを提案します。当社の証明可能で識別可能な概念発見方法は、何百ものトレーニング済みモデルと依存概念を含む一連の実験で競合他社よりも大幅に優れており、グラウンドトゥルースとの整合性が最大 29% 向上しています。私たちの結果は、人間のラベルを付けずに信頼できるコンセプトを発見するための厳密な基盤を提供します。

Interest in understanding and factorizing learned embedding spaces through conceptual explanations is steadily growing. When no human concept labels are available, concept discovery methods search trained embedding spaces for interpretable concepts like object shape or color that can be used to provide post-hoc explanations for decisions. Unlike previous work, we argue that concept discovery should be identifiable, meaning that a number of known concepts can be provably recovered to guarantee reliability of the explanations. As a starting point, we explicitly make the connection between concept discovery and classical methods like Principal Component Analysis and Independent Component Analysis by showing that they can recover independent concepts with non-Gaussian distributions. For dependent concepts, we propose two novel approaches that exploit functional compositionality properties of image-generating processes. Our provably identifiable concept discovery methods substantially outperform competitors on a battery of experiments including hundreds of trained models and dependent concepts, where they exhibit up to 29 % better alignment with the ground truth. Our results provide a rigorous foundation for reliable concept discovery without human labels.

updated: Tue Feb 21 2023 13:55:22 GMT+0000 (UTC)

published: Tue Jun 28 2022 10:21:17 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト