Understanding and Increasing Efficiency of Frank-Wolfe Adversarial Training

Theodoros Tsiligkaridis; Jay Roberts

フランク・ウルフの敵対的訓練の理解と効率の向上

ディープニューラルネットワークは、敵対的攻撃として知られる小さな摂動によって簡単にだまされます。 Adversarial Training（AT）は、ロバストな最適化問題をほぼ解決して最悪の場合の損失を最小限に抑える手法であり、最も効果的な防御と広く見なされています。 ATプロセスで強力な敵対的な例を生成するための計算時間が長いため、トレーニング時間を短縮するためにシングルステップアプローチが提案されています。ただし、これらの方法は、トレーニング中に敵対的な精度が低下するという壊滅的な過剰適合に悩まされ、改善が提案されていますが、トレーニング時間が長くなり、堅牢性はマルチステップATの場合とはほど遠いものです。損失状況とℓ_∞FW攻撃のℓ_2歪みとの間の幾何学的関係を明らかにするFW最適化（FW-AT）を使用した敵対的トレーニングの理論的フレームワークを開発します。 FW攻撃の高い歪みは、攻撃パスに沿った小さな勾配変動と同等であることを分析的に示します。次に、さまざまなディープニューラルネットワークアーキテクチャで、堅牢なモデルに対するℓ_∞攻撃がほぼ最大の歪みを達成する一方で、標準ネットワークの歪みは低くなることが実験的に実証されています。壊滅的な過剰適合は、FW攻撃の低歪みと強く相関していることが実験的に示されています。この数学的透明性により、FWとProjected Gradient Descent（PGD）の最適化が区別されます。理論的フレームワークの有用性を実証するために、FW-AT-Adaptを開発します。これは、単純な歪み測定を使用してトレーニング中の攻撃ステップ数を適応させ、堅牢性を損なうことなく効率を高める新しい敵対的トレーニングアルゴリズムです。 FW-AT-Adaptは、シングルステップの高速AT方式と同等のトレーニング時間を提供し、ホワイトボックスおよびブラックボックス設定での敵対的精度の損失を最小限に抑えながら、高速AT方式とマルチステップPGD-ATの間のギャップを埋めます。

Deep neural networks are easily fooled by small perturbations known as adversarial attacks. Adversarial Training (AT) is a technique that approximately solves a robust optimization problem to minimize the worst-case loss and is widely regarded as the most effective defense. Due to the high computation time for generating strong adversarial examples in the AT process, single-step approaches have been proposed to reduce training time. However, these methods suffer from catastrophic overfitting where adversarial accuracy drops during training, and although improvements have been proposed, they increase training time and robustness is far from that of multi-step AT. We develop a theoretical framework for adversarial training with FW optimization (FW-AT) that reveals a geometric connection between the loss landscape and the ℓ_2 distortion of ℓ_∞ FW attacks. We analytically show that high distortion of FW attacks is equivalent to small gradient variation along the attack path. It is then experimentally demonstrated on various deep neural network architectures that ℓ_∞ attacks against robust models achieve near maximal distortion, while standard networks have lower distortion. It is experimentally shown that catastrophic overfitting is strongly correlated with low distortion of FW attacks. This mathematical transparency differentiates FW from Projected Gradient Descent (PGD) optimization. To demonstrate the utility of our theoretical framework we develop FW-AT-Adapt, a novel adversarial training algorithm which uses a simple distortion measure to adapt the number of attack steps during training to increase efficiency without compromising robustness. FW-AT-Adapt provides training time on par with single-step fast AT methods and closes the gap between fast AT methods and multi-step PGD-AT with minimal loss in adversarial accuracy in white-box and black-box settings.

updated: Fri Mar 25 2022 15:54:53 GMT+0000 (UTC)

published: Tue Dec 22 2020 21:36:52 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト