Concept Whitening for Interpretable Image Recognition

Zhi Chen; Yijie Bei; Cynthia Rudin

解釈可能な画像認識のためのコンセプトホワイトニング

層を横断するときに、ニューラルネットワークは概念について何をエンコードしますか？機械学習の解釈可能性は間違いなく重要ですが、ニューラルネットワークの計算を理解するのは非常に困難です。それらの隠された層の内部を見ようとする試みは、誤解を招くか、使用できないか、潜在空間が持っていない可能性のある特性を所有することに依存する可能性があります。この作業では、ニューラルネットワークを事後分析するのではなく、コンセプトホワイトニング（CW）と呼ばれるメカニズムを導入して、ネットワークの特定のレイヤーを変更し、そのレイヤーに至るまでの計算をよりよく理解できるようにします。コンセプトホワイトニングモジュールがCNNに追加されると、潜在空間の軸は、関心のある既知のコンセプトと整列します。実験により、CWは、ネットワークがレイヤーを介して概念を徐々に学習する方法について、はるかに明確な理解を提供できることを示しています。 CWは、潜在空間を正規化し、非相関化（白化）するという点で、バッチ正規化レイヤーの代替手段です。 CWは、予測パフォーマンスを損なうことなく、ネットワークの任意のレイヤーで使用できます。

What does a neural network encode about a concept as we traverse through the layers? Interpretability in machine learning is undoubtedly important, but the calculations of neural networks are very challenging to understand. Attempts to see inside their hidden layers can either be misleading, unusable, or rely on the latent space to possess properties that it may not have. In this work, rather than attempting to analyze a neural network posthoc, we introduce a mechanism, called concept whitening (CW), to alter a given layer of the network to allow us to better understand the computation leading up to that layer. When a concept whitening module is added to a CNN, the axes of the latent space are aligned with known concepts of interest. By experiment, we show that CW can provide us a much clearer understanding for how the network gradually learns concepts over layers. CW is an alternative to a batch normalization layer in that it normalizes, and also decorrelates (whitens) the latent space. CW can be used in any layer of the network without hurting predictive performance.

updated: Mon Dec 07 2020 19:09:35 GMT+0000 (UTC)

published: Wed Feb 05 2020 05:28:09 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト