Interpreting and Improving Adversarial Robustness of Deep Neural Networks with Neuron Sensitivity

Chongzhi Zhang; Aishan Liu; Xianglong Liu; Yitao Xu; Hang Yu; Yuqing Ma; Tianlin Li

ニューロン感度を備えたディープニューラルネットワークの敵対的堅牢性の解釈と改善

ディープニューラルネットワーク（DNN）は、知覚できない摂動を伴う入力がDNNを誤った結果に誤解させる敵対的な例に対して脆弱です。それらがもたらす潜在的なリスクにもかかわらず、敵対的な例は、DNNの弱点と盲点に関する洞察を提供するためにも貴重です。したがって、敵対的な設定でのDNNの解釈可能性は、意思決定プロセスの背後にある理論的根拠を説明し、より深い実用化につながるより深い理解を目的としています。この問題に対処するために、良性および敵対的な例に対するニューロンの行動変動強度によって測定されるニューロン感度の新しい観点から、深いモデルの敵対的ロバスト性を説明しようとします。この論文では、敏感なニューロンが敵の設定でモデル予測に最も重要な貢献をするため、敵のロバスト性とニューロンの感度との密接な関係を最初に描きます。それに基づいて、良性と敵対の例の間で敏感なニューロンの類似性を制約することで敵対的なロバストネスを改善し、敵対的なノイズに対する敏感なニューロンの動作を安定させることをさらに提案します。さらに、最先端の敵対的トレーニング方法がニューロンの感度を低下させることでモデルの堅牢性を改善することを実証します。これにより、敵対的堅牢性とニューロン感度の強力な関連性と、敏感なニューロンを使用して堅牢なモデルを構築することの有効性が確認されます。さまざまなデータセットに対する広範な実験により、このアルゴリズムが優れた結果を効果的に達成していることが実証されています。

Deep neural networks (DNNs) are vulnerable to adversarial examples where inputs with imperceptible perturbations mislead DNNs to incorrect results. Despite the potential risk they bring, adversarial examples are also valuable for providing insights into the weakness and blind-spots of DNNs. Thus, the interpretability of a DNN in the adversarial setting aims to explain the rationale behind its decision-making process and makes deeper understanding which results in better practical applications. To address this issue, we try to explain adversarial robustness for deep models from a new perspective of neuron sensitivity which is measured by neuron behavior variation intensity against benign and adversarial examples. In this paper, we first draw the close connection between adversarial robustness and neuron sensitivities, as sensitive neurons make the most non-trivial contributions to model predictions in the adversarial setting. Based on that, we further propose to improve adversarial robustness by constraining the similarities of sensitive neurons between benign and adversarial examples which stabilizes the behaviors of sensitive neurons towards adversarial noises. Moreover, we demonstrate that state-of-the-art adversarial training methods improve model robustness by reducing neuron sensitivities which in turn confirms the strong connections between adversarial robustness and neuron sensitivity as well as the effectiveness of using sensitive neurons to build robust models. Extensive experiments on various datasets demonstrate that our algorithm effectively achieves excellent results.

updated: Mon Nov 30 2020 15:13:09 GMT+0000 (UTC)

published: Mon Sep 16 2019 04:10:13 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト