Representation Quality Of Neural Networks Links To Adversarial Attacks and Defences

Shashank Kotyan; Danilo Vasconcellos Vargas; Moe Matsuki

ニューラルネットワークの表現品質は敵対的な攻撃と防御にリンクしている

ニューラルネットワークは、さまざまな敵対的なアルゴリズムに対して脆弱であることが示されています。この堅牢性の欠如の根拠を理解するための重要なステップは、ニューラルネットワークの表現が既存の機能をエンコードする可能性を評価することです。ここでは、Raw Zero-ShotというタイトルのZero-Shot Learningに基づく新しいテストを使用して、ニューラルネットワークの表現品質を理解する方法を提案します。主要なアイデアは、アルゴリズムが豊富な機能を学習する場合、そのような機能は「不明な」クラスを以前に学習した機能の集合として解釈できるはずであるということです。これは、学習された機能が十分に一般的であることを考えると、不明なクラスは通常、認識されたクラスといくつかの通常の機能を共有するためです。さらに、これらの学習された機能を評価して未知のクラスを解釈する2つのメトリックを紹介します。 1つはクラスター間の検証手法（Davies-Bouldin Index）に基づいており、もう1つは近似グラウンドトゥルースまでの距離に基づいています。実験は、敵対的防御が分類器の表現を改善することを示唆しており、さらに分類器のロバスト性を改善するには、表現の質も改善する必要があることを示唆しています。実験はまた、メトリックと敵対的攻撃の間の強い関連（高いピアソン相関と低いp値）を明らかにします。興味深いことに、この結果は、CapsNetなどの動的ルーティングネットワークの表現が優れていることを示していますが、現在のより深いニューラルネットワークは、表現の質と正確さをトレードオフしています。コードはhttp://bit.ly/RepresentationMetricsで入手できます。

Neural networks have been shown vulnerable to a variety of adversarial algorithms. A crucial step to understanding the rationale for this lack of robustness is to assess the potential of the neural networks' representation to encode the existing features. Here, we propose a method to understand the representation quality of the neural networks using a novel test based on Zero-Shot Learning, entitled Raw Zero-Shot. The principal idea is that, if an algorithm learns rich features, such features should be able to interpret "unknown" classes as an aggregate of previously learned features. This is because unknown classes usually share several regular features with recognised classes, given the features learned are general enough. We further introduce two metrics to assess these learned features to interpret unknown classes. One is based on inter-cluster validation technique (Davies-Bouldin Index), and the other is based on the distance to an approximated ground-truth. Experiments suggest that adversarial defences improve the representation of the classifiers, further suggesting that to improve the robustness of the classifiers, one has to improve the representation quality also. Experiments also reveal a strong association (a high Pearson Correlation and low p-value) between the metrics and adversarial attacks. Interestingly, the results indicate that dynamic routing networks such as CapsNet have better representation while current deeper neural networks are trading off representation quality for accuracy. Code available at http://bit.ly/RepresentationMetrics.

updated: Thu Jul 16 2020 14:49:14 GMT+0000 (UTC)

published: Sat Jun 15 2019 23:32:33 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト