Toward Transparent AI: A Survey on Interpreting the Inner Structures of Deep Neural Networks

Tilman Räuker; Anson Ho; Stephen Casper; Dylan Hadfield-Menell

透過的な AI に向けて: ディープニューラルネットワークの内部構造の解釈に関する調査

過去 10 年間の機械学習では、規模と機能が大幅に向上しました。ディープニューラルネットワーク (DNN) は、現実の世界でますます展開されています。しかし、それらは分析が難しく、どのように機能するかを厳密に理解せずに使用することに懸念が生じます。それらを解釈するための効果的なツールは、問題の特定、バグの修正、および基本的な理解の向上を支援することにより、より信頼できる AI を構築するために重要になります。特に、DNN の内部コンポーネントの説明に焦点を当てた「内部」解釈可能性手法は、機械的な理解を深め、手動での変更を導き、ソリューションをリバースエンジニアリングするのに適しています。最近の研究の多くは DNN の解釈可能性に焦点を当てており、これまでのところ急速な進歩により、メソッドの完全な体系化が困難になっています。この調査では、内部解釈ツールに焦点を当てて 300 を超える作品をレビューします。ネットワークのどの部分を説明するのに役立つか (重み、ニューロン、サブネットワーク、または潜在表現) によってメソッドを分類する分類法を導入し、トレーニング中 (組み込み) またはトレーニング後 (事後) に実装されるかどうかを示します。私たちの知る限りでは、解釈可能性の研究と敵対的ロバスト性、継続的学習、モジュール性、ネットワーク圧縮、および人間の視覚システムの研究との間の多くの関連性を調査した最初の研究者でもあります。主な課題について議論し、解釈可能性研究の現状はほとんど非生産的であると主張します。最後に、実際のアプリケーションでエンジニアにとって解釈可能性ツールをより便利にするために、診断、デバッグ、敵対者、およびベンチマークを強調する将来の作業の重要性を強調します。

The last decade of machine learning has seen drastic increases in scale and capabilities. Deep neural networks (DNNs) are increasingly being deployed in the real world. However, they are difficult to analyze, raising concerns about using them without a rigorous understanding of how they function. Effective tools for interpreting them will be important for building more trustworthy AI by helping to identify problems, fix bugs, and improve basic understanding. In particular, "inner" interpretability techniques, which focus on explaining the internal components of DNNs, are well-suited for developing a mechanistic understanding, guiding manual modifications, and reverse engineering solutions. Much recent work has focused on DNN interpretability, and rapid progress has thus far made a thorough systematization of methods difficult. In this survey, we review over 300 works with a focus on inner interpretability tools. We introduce a taxonomy that classifies methods by what part of the network they help to explain (weights, neurons, subnetworks, or latent representations) and whether they are implemented during (intrinsic) or after (post hoc) training. To our knowledge, we are also the first to survey a number of connections between interpretability research and work in adversarial robustness, continual learning, modularity, network compression, and studying the human visual system. We discuss key challenges and argue that the status quo in interpretability research is largely unproductive. Finally, we highlight the importance of future work that emphasizes diagnostics, debugging, adversaries, and benchmarking in order to make interpretability tools more useful to engineers in practical applications.

updated: Fri Aug 18 2023 21:14:43 GMT+0000 (UTC)

published: Wed Jul 27 2022 01:59:13 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト