Investigating Vulnerabilities of Deep Neural Policies

Ezgi Korkmaz

ディープニューラルポリシーの脆弱性の調査

ディープニューラルネットワークに基づく強化学習ポリシーは、ニューラルネットワークの画像分類器とほぼ同じように、入力に対する知覚できない敵対的摂動に対して脆弱です。最近の研究は、これらの知覚できない摂動の存在下での訓練（すなわち、敵対的訓練）に基づいて、敵対的摂動に対する深層強化学習エージェントのロバスト性を改善するためのいくつかの方法を提案している。この論文では、エージェントが学習した神経政策に対する敵対的訓練の効果を研究します。特に、最悪の場合の分布シフトと機能感度に基づいて、深い神経ポリシーに関する敵対的トレーニングの結果を調査するために、2つの異なる並列アプローチに従います。最初のアプローチでは、敵対的に訓練された神経政策とバニラ訓練された神経政策の両方について計算された最小摂動のフーリエスペクトルを比較します。 OpenAI Atari環境での実験を通じて、敵対的に訓練されたポリシーに対して計算された最小の摂動が、フーリエドメインの低周波数により焦点を合わせていることを示し、低周波数の摂動に対するこれらのポリシーの感度が高いことを示しています。 2番目のアプローチでは、深部神経ポリシーの特徴感度を測定する新しい方法を提案し、最先端の敵対的に訓練された深部神経ポリシーとバニラ訓練された深部神経ポリシーにおけるこれらの特徴感度の違いを比較します。私たちの結果は、敵対的な訓練と神経政策の頑健性のさまざまな概念との関係を理解するための最初のステップになると信じています。

Reinforcement learning policies based on deep neural networks are vulnerable to imperceptible adversarial perturbations to their inputs, in much the same way as neural network image classifiers. Recent work has proposed several methods to improve the robustness of deep reinforcement learning agents to adversarial perturbations based on training in the presence of these imperceptible perturbations (i.e. adversarial training). In this paper, we study the effects of adversarial training on the neural policy learned by the agent. In particular, we follow two distinct parallel approaches to investigate the outcomes of adversarial training on deep neural policies based on worst-case distributional shift and feature sensitivity. For the first approach, we compare the Fourier spectrum of minimal perturbations computed for both adversarially trained and vanilla trained neural policies. Via experiments in the OpenAI Atari environments we show that minimal perturbations computed for adversarially trained policies are more focused on lower frequencies in the Fourier domain, indicating a higher sensitivity of these policies to low frequency perturbations. For the second approach, we propose a novel method to measure the feature sensitivities of deep neural policies and we compare these feature sensitivity differences in state-of-the-art adversarially trained deep neural policies and vanilla trained deep neural policies. We believe our results can be an initial step towards understanding the relationship between adversarial training and different notions of robustness for neural policies.

updated: Mon Aug 30 2021 10:04:50 GMT+0000 (UTC)

published: Mon Aug 30 2021 10:04:50 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト