Understanding Frank-Wolfe Adversarial Training

Theodoros Tsiligkaridis; Jay Roberts

フランク・ウルフの敵対的訓練を理解する

ディープニューラルネットワークは、敵対的攻撃として知られる小さな摂動によって簡単にだまされます。 Adversarial Training（AT）は、ロバストな最適化問題をほぼ解決して最悪の場合の損失を最小限に抑える手法であり、このような攻撃に対する最も効果的な防御と広く見なされています。投影勾配降下法（PGD）は、ATの内部最大化を近似的に解くために最も注目されていますが、フランク・ウルフ（FW）最適化は投影法がなく、任意のℓ_pノルムに適合できます。フランク・ウルフの敵対的訓練アプローチが提示され、さまざまなアーキテクチャ、攻撃、およびデータセットに対してPGD-ATと同じくらい競争力のあるレベルの堅牢性を提供することが示されています。 FW攻撃の表現を利用して、次のような幾何学的洞察を導き出すことができます。ℓ_∞攻撃のℓ_2ノルムが大きいほど、損失勾配の変動は少なくなります。次に、堅牢なモデルに対するℓ_∞攻撃が可能な最大のℓ_2歪みに近くなり、ATが与える特定のタイプの正則化に新しいレンズを提供することが実験的に実証されています。 FW最適化を堅牢なモデルと組み合わせて使用することで、高価なℓ_1予測に依存することなく、人間が解釈できるスパースな反事実的説明を生成できます。

Deep neural networks are easily fooled by small perturbations known as adversarial attacks. Adversarial Training (AT) is a technique that approximately solves a robust optimization problem to minimize the worst-case loss and is widely regarded as the most effective defense against such attacks. While projected gradient descent (PGD) has received most attention for approximately solving the inner maximization of AT, Frank-Wolfe (FW) optimization is projection-free and can be adapted to any ℓ_p norm. A Frank-Wolfe adversarial training approach is presented and is shown to provide as competitive level of robustness as PGD-AT for a variety of architectures, attacks, and datasets. Exploiting a representation of the FW attack we are able to derive the geometric insight that: The larger the ℓ_2 norm of an ℓ_∞ attack is, the less loss gradient variation there is. It is then experimentally demonstrated that ℓ_∞ attacks against robust models achieve near the maximal possible ℓ_2 distortion, providing a new lens into the specific type of regularization that AT bestows. Using FW optimization in conjunction with robust models, we are able to generate sparse human-interpretable counterfactual explanations without relying on expensive ℓ_1 projections.

updated: Mon Feb 08 2021 17:38:24 GMT+0000 (UTC)

published: Tue Dec 22 2020 21:36:52 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト