Quantifying the Preferential Direction of the Model Gradient in Adversarial Training With Projected Gradient Descent

Ricardo Bigolin Lanfredi; Joyce D. Schroeder; Tolga Tasdizen

予測勾配降下法による敵対的訓練におけるモデル勾配の優先方向の定量化

敵対的訓練、特に投影勾配降下法（PGD）は、敵対的攻撃に対する堅牢性を向上させるための成功したアプローチです。敵対的な訓練の後、入力に関するモデルの勾配は優先的な方向を持ちます。ただし、位置合わせの方向は数学的に十分に確立されていないため、定量的に評価することは困難です。この方向の新しい定義を、決定空間で最も近い不正確なクラスのサポートの最も近い点を指すベクトルの方向として提案します。敵対的トレーニング後のこの方向との整合性を評価するために、生成的敵対的ネットワークを使用して、画像に存在するクラスを変更するために必要な最小の残差を生成するメトリックを適用します。 PGDでトレーニングされたモデルは、定義に従ってベースラインよりも高いアライメントを持ち、メトリックは競合するメトリックの定式化よりも高いアライメント値を示し、このアライメントを適用するとモデルの堅牢性が向上することを示します。

Adversarial training, especially projected gradient descent (PGD), has been a successful approach for improving robustness against adversarial attacks. After adversarial training, gradients of models with respect to their inputs have a preferential direction. However, the direction of alignment is not mathematically well established, making it difficult to evaluate quantitatively. We propose a novel definition of this direction as the direction of the vector pointing toward the closest point of the support of the closest inaccurate class in decision space. To evaluate the alignment with this direction after adversarial training, we apply a metric that uses generative adversarial networks to produce the smallest residual needed to change the class present in the image. We show that PGD-trained models have a higher alignment than the baseline according to our definition, that our metric presents higher alignment values than a competing metric formulation, and that enforcing this alignment increases the robustness of models.

updated: Tue Jun 15 2021 23:49:44 GMT+0000 (UTC)

published: Thu Sep 10 2020 07:48:42 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト