Toward Transparent AI: A Survey on Interpreting the Inner Structures of Deep Neural Networks

Tilman Räuker; Anson Ho; Stephen Casper; Dylan Hadfield-Menell

透過的な AI に向けて: ディープニューラルネットワークの内部構造の解釈に関する調査

過去 10 年間の機械学習では、規模と機能が大幅に向上しました。ディープニューラルネットワーク (DNN) は、現実の世界でますます展開されています。ただし、それらは一般的に分析が難しく、機能を厳密に理解せずに使用することに懸念が生じます。それらを解釈するための効果的なツールは、障害の特定、バグの修正、および基本的な理解の向上を支援することにより、より信頼できる AI を構築するために重要になります。特に、DNN の内部コンポーネントの説明に焦点を当てた「内部」解釈可能性手法は、機械的な理解を深め、手動での変更を導き、ソリューションをリバースエンジニアリングするのに適しています。最近の研究の多くは DNN の解釈可能性に焦点を当てており、これまでのところ急速な進歩により、メソッドの完全な体系化が困難になっています。この調査では、内部解釈ツールに焦点を当てて 300 を超える作品をレビューします。ネットワークのどの部分を説明するのに役立つか (重み、ニューロン、サブネットワーク、または潜在表現) によってメソッドを分類する分類法を導入し、トレーニング中 (組み込み) またはトレーニング後 (事後) に実装されるかどうかを示します。私たちの知る限りでは、解釈可能性の研究と敵対的ロバスト性、継続的学習、モジュール性、ネットワーク圧縮、および人間の視覚システムの研究との間の多くの関連性を調査した最初の研究者でもあります。最後に、主要な課題について説明し、診断、ベンチマーク、および堅牢性を強調する将来の作業について議論します。

The last decade of machine learning has seen drastic increases in scale and capabilities. Deep neural networks (DNNs) are increasingly being deployed in the real world. However, they are generally difficult to analyze, raising concerns about using them without a rigorous understanding of how they function. Effective tools for interpreting them will be important for building more trustworthy AI by helping to identify failures, fix bugs, and improve basic understanding. In particular, "inner" interpretability techniques, which focus on explaining the internal components of DNNs, are well-suited for developing a mechanistic understanding, guiding manual modifications, and reverse engineering solutions. Much recent work has focused on DNN interpretability, and rapid progress has thus far made a thorough systematization of methods difficult. In this survey, we review over 300 works with a focus on inner interpretability tools. We introduce a taxonomy that classifies methods by what part of the network they help to explain (weights, neurons, subnetworks, or latent representations) and whether they are implemented during (intrinsic) or after (post hoc) training. To our knowledge, we are also the first to survey a number of connections between interpretability research and work in adversarial robustness, continual learning, modularity, network compression, and studying the human visual system. Finally, we discuss key challenges and argue for future work emphasizing diagnostics, benchmarking, and robustness.

updated: Mon Sep 05 2022 18:35:00 GMT+0000 (UTC)

published: Wed Jul 27 2022 01:59:13 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト