Don't trust your eyes: on the (un)reliability of feature visualizations

Robert Geirhos; Roland S. Zimmermann; Blair Bilodeau; Wieland Brendel; Been Kim

自分の目を信じないでください: 特徴の視覚化の信頼性 (不) について

ニューラルネットワークはどのようにしてピクセルからパターンを抽出するのでしょうか?機能の視覚化は、最適化を通じて高度に活性化されるパターンを視覚化することで、この重要な質問に答えようとします。今日、視覚化手法は、一種の機械的解釈可能性として、ニューラルネットワークの内部動作に関する知識の基礎を形成しています。ここで、特徴量の視覚化はどの程度信頼できるのかを尋ねます。私たちは、自然入力における通常のネットワーク動作から完全に切り離された任意のパターンを表示するように特徴の視覚化を騙すネットワーク回路を開発することから調査を開始します。次に、標準的な未操作のネットワークで同様の現象が発生するという証拠を提供します。特徴の視覚化は標準入力とは大きく異なる方法で処理され、ニューラルネットワークが自然画像をどのように処理するかを「説明」する能力に疑問を投げかけます。私たちは、特徴の視覚化によって確実に理解できる関数のセットが非常に小さく、一般的なブラックボックスニューラルネットワークが含まれていないことを証明する理論によって、この経験的発見を裏付けています。したがって、今後の有望な方法は、より信頼性の高い特徴の視覚化を保証するために特定の構造を強制するネットワークの開発である可能性があります。

How do neural networks extract patterns from pixels? Feature visualizations attempt to answer this important question by visualizing highly activating patterns through optimization. Today, visualization methods form the foundation of our knowledge about the internal workings of neural networks, as a type of mechanistic interpretability. Here we ask: How reliable are feature visualizations? We start our investigation by developing network circuits that trick feature visualizations into showing arbitrary patterns that are completely disconnected from normal network behavior on natural input. We then provide evidence for a similar phenomenon occurring in standard, unmanipulated networks: feature visualizations are processed very differently from standard input, casting doubt on their ability to "explain" how neural networks process natural images. We underpin this empirical finding by theory proving that the set of functions that can be reliably understood by feature visualization is extremely small and does not include general black-box neural networks. Therefore, a promising way forward could be the development of networks that enforce certain structures in order to ensure more reliable feature visualizations.

updated: Wed Jun 07 2023 18:31:39 GMT+0000 (UTC)

published: Wed Jun 07 2023 18:31:39 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト