Quantifying the Preferential Direction of the Model Gradient in Adversarial Training With Projected Gradient Descent

Ricardo Bigolin Lanfredi; Joyce D. Schroeder; Tolga Tasdizen

予測勾配降下による敵対的トレーニングにおけるモデル勾配の優先方向の定量化

敵対的トレーニング、特に射影勾配降下 (PGD) は、敵対的攻撃に対する堅牢性を向上させるための成功したアプローチであることが証明されています。敵対的トレーニングの後、入力に関するモデルの勾配は優先的な方向を持ちます。ただし、整列の方向は数学的に十分に確立されていないため、定量的に評価することは困難です。決定空間で最も近い不正確なクラスのサポートの最も近い点を指すベクトルの方向として、この方向の新しい定義を提案します。敵対的トレーニング後にこの方向との整合性を評価するために、生成的敵対的ネットワークを使用して、画像に存在するクラスを変更するために必要な最小の残差を生成するメトリックを適用します。 PGD でトレーニングされたモデルは、定義に従ってベースラインよりも高いアライメントを持ち、メトリックが競合するメトリック定式化よりも高いアライメント値を示し、このアライメントを適用することでモデルの堅牢性が向上することを示します。

Adversarial training, especially projected gradient descent (PGD), has proven to be a successful approach for improving robustness against adversarial attacks. After adversarial training, gradients of models with respect to their inputs have a preferential direction. However, the direction of alignment is not mathematically well established, making it difficult to evaluate quantitatively. We propose a novel definition of this direction as the direction of the vector pointing toward the closest point of the support of the closest inaccurate class in decision space. To evaluate the alignment with this direction after adversarial training, we apply a metric that uses generative adversarial networks to produce the smallest residual needed to change the class present in the image. We show that PGD-trained models have a higher alignment than the baseline according to our definition, that our metric presents higher alignment values than a competing metric formulation, and that enforcing this alignment increases the robustness of models.

updated: Thu Dec 15 2022 23:35:23 GMT+0000 (UTC)

published: Thu Sep 10 2020 07:48:42 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト