When are Post-hoc Conceptual Explanations Identifiable?

Tobias Leemann; Michael Kirchhof; Yao Rong; Enkelejda Kasneci; Gjergji Kasneci

事後的な概念的説明が特定できるのはいつですか?

概念的な説明を通じて学習された埋め込み空間を理解して因数分解することへの関心は着実に高まっています。人間の概念ラベルが利用できない場合、概念発見メソッドは、学習済みの埋め込み空間を検索して、オブジェクトの形状や色などの解釈可能な概念を探します。これらの概念は、意思決定に対する事後的な説明を提供するために使用できます。これまでの研究とは異なり、我々は、概念の発見は識別可能であるべきだと主張します。これは、説明の信頼性を保証するために、多くの既知の概念を証明可能に復元できることを意味します。出発点として、非ガウス分布で独立した概念を回復できることを示すことで、概念発見と主成分分析や独立成分分析などの古典的な手法との関係を明示的に示します。依存概念については、画像生成プロセスの機能的構成特性を利用する 2 つの新しいアプローチを提案します。当社の実証的に識別可能なコンセプト発見メソッドは、数百ものトレーニング済みモデルや依存コンセプトを含む一連の実験において競合他社よりも大幅に優れたパフォーマンスを発揮し、グランドトゥルースとの整合性が最大 29 % 優れています。私たちの結果は、人間によるラベルなしで信頼できるコンセプト発見のための厳密な基盤を提供します。

Interest in understanding and factorizing learned embedding spaces through conceptual explanations is steadily growing. When no human concept labels are available, concept discovery methods search trained embedding spaces for interpretable concepts like object shape or color that can be used to provide post-hoc explanations for decisions. Unlike previous work, we argue that concept discovery should be identifiable, meaning that a number of known concepts can be provably recovered to guarantee reliability of the explanations. As a starting point, we explicitly make the connection between concept discovery and classical methods like Principal Component Analysis and Independent Component Analysis by showing that they can recover independent concepts with non-Gaussian distributions. For dependent concepts, we propose two novel approaches that exploit functional compositionality properties of image-generating processes. Our provably identifiable concept discovery methods substantially outperform competitors on a battery of experiments including hundreds of trained models and dependent concepts, where they exhibit up to 29 % better alignment with the ground truth. Our results provide a rigorous foundation for reliable concept discovery without human labels.

updated: Thu May 25 2023 16:10:42 GMT+0000 (UTC)

published: Tue Jun 28 2022 10:21:17 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト