What Do Deep Nets Learn? Class-wise Patterns Revealed in the Input Space

Shihao Zhao; Xingjun Ma; Yisen Wang; James Bailey; Bo Li; Yu-Gang Jiang

ディープネットは何を学びますか？入力スペースで明らかにされたクラスごとのパターン

ディープニューラルネットワーク（DNN）は、最先端のパフォーマンスを実現するために、さまざまなアプリケーションにますます展開されています。ただし、モデルがデータからどのような知識を学習したかについての理解が限られているブラックボックスとして適用されることがよくあります。この論文では、画像分類に焦点を当て、自然、バックドア、敵対者を含む3つの異なる設定の下でDNNによって学習されたクラスごとの知識（パターン）を視覚化して理解する方法を提案します。既存の視覚化手法とは異なり、この手法では、ピクセル空間で単一の予測パターンを検索して、各クラスのモデルによって学習された知識を表します。提案された方法に基づいて、自然（クリーン）データでトレーニングされたDNNがいくつかのテクスチャとともに抽象的な形状を学習し、バックドアモデルがバックドアクラスの疑わしいパターンを学習することを示します。興味深いことに、DNNがクラスごとに単一の予測パターンを学習できるという現象は、DNNがクリーンなデータからでもバックドアを学習できることを示しており、パターン自体がバックドアトリガーです。敵対的な設定では、敵対的に訓練されたモデルがより単純化された形状パターンを学習する傾向があることを示します。私たちの方法は、さまざまな設定の下でさまざまなデータセットでDNNによって学習された知識をよりよく理解するための便利なツールとして役立ちます。

Deep neural networks (DNNs) are increasingly deployed in different applications to achieve state-of-the-art performance. However, they are often applied as a black box with limited understanding of what knowledge the model has learned from the data. In this paper, we focus on image classification and propose a method to visualize and understand the class-wise knowledge (patterns) learned by DNNs under three different settings including natural, backdoor and adversarial. Different to existing visualization methods, our method searches for a single predictive pattern in the pixel space to represent the knowledge learned by the model for each class. Based on the proposed method, we show that DNNs trained on natural (clean) data learn abstract shapes along with some texture, and backdoored models learn a suspicious pattern for the backdoored class. Interestingly, the phenomenon that DNNs can learn a single predictive pattern for each class indicates that DNNs can learn a backdoor even from clean data, and the pattern itself is a backdoor trigger. In the adversarial setting, we show that adversarially trained models tend to learn more simplified shape patterns. Our method can serve as a useful tool to better understand the knowledge learned by DNNs on different datasets under different settings.

updated: Sat Feb 06 2021 05:09:40 GMT+0000 (UTC)

published: Mon Jan 18 2021 06:38:41 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト