Understanding Neural Networks and Individual Neuron Importance via Information-Ordered Cumulative Ablation

Rana Ali Amjad; Kairen Liu; Bernhard C. Geiger

情報順序付けされた累積アブレーションによるニューラルネットワークと個々のニューロンの重要性を理解する

この作業では、3 つの情報理論量 (エントロピー、クラス変数との相互情報量、およびカルバック・ライブラーダイバージェンスに基づくクラス選択性尺度) の使用を調査し、すでに訓練された完全に接続された人の動作を理解し、研究します。フィードフォワードニューラルネットワーク。 MNIST、FashionMNIST、および CIFAR-10 でトレーニングされたネットワーク内のニューロンを累積的に除去することにより、これらの情報理論の量とテストセットの分類パフォーマンスとの関係を分析します。私たちの結果は、Morcos らによって最近発表されたものと類似しており、クラスの選択性は分類パフォーマンスの良い指標ではないことを示しています。ただし、個々のレイヤーを個別に見ると、少なくとも ReLU 活性化関数を備えたネットワークでは、相互情報量とクラス選択性の両方が分類パフォーマンスと正の相関関係にあります。この現象の説明を提供し、提案された情報理論の量をレイヤー間で比較することは賢明ではないと結論付けます。さらに、昇順または降順の情報理論量によるニューロンの累積的除去を使用して、比較的低い計算コストで、冗長性や相乗効果などの複数のニューロンの関節動作に関する仮説を策定できることを示します。また、ニューラルネットワークの情報ボトルネック理論との関連も示します。

In this work, we investigate the use of three information-theoretic quantities -- entropy, mutual information with the class variable, and a class selectivity measure based on Kullback-Leibler divergence -- to understand and study the behavior of already trained fully-connected feed-forward neural networks. We analyze the connection between these information-theoretic quantities and classification performance on the test set by cumulatively ablating neurons in networks trained on MNIST, FashionMNIST, and CIFAR-10. Our results parallel those recently published by Morcos et al., indicating that class selectivity is not a good indicator for classification performance. However, looking at individual layers separately, both mutual information and class selectivity are positively correlated with classification performance, at least for networks with ReLU activation functions. We provide explanations for this phenomenon and conclude that it is ill-advised to compare the proposed information-theoretic quantities across layers. Furthermore, we show that cumulative ablation of neurons with ascending or descending information-theoretic quantities can be used to formulate hypotheses regarding the joint behavior of multiple neurons, such as redundancy and synergy, with comparably low computational cost. We also draw connections to the information bottleneck theory for neural networks.

updated: Wed Jun 09 2021 15:28:37 GMT+0000 (UTC)

published: Wed Apr 18 2018 12:29:24 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト