Boosting Adversarial Transferability by Achieving Flat Local Maxima

Zhijin Ge; Fanhua Shang; Hongying Liu; Yuanyuan Liu; Xiaosen Wang

フラットな極大値を達成することで敵対的転送可能性を高める

転送ベースの攻撃は、サロゲートモデル上で生成された敵対的な例を採用してさまざまなモデルを攻撃するため、物理世界に適用可能となり、関心が高まっています。最近、さまざまな観点から敵対的転送可能性を高めるために、さまざまな敵対的攻撃が登場しています。この研究では、平坦な局所最小値が良好な一般化と相関しているという事実に触発され、元の損失関数にペナルティ付きの勾配ノルムを導入することによって、平坦な局所領域での敵対的な例が良好な伝達性を持つ傾向があると仮定し、経験的に検証します。勾配正則化ノルムを直接最適化することは計算コストが高く、敵対的な例を生成するのが難しいため、目的関数の勾配更新を簡素化する近似最適化手法を提案します。具体的には、例をランダムにサンプリングし、1 次の勾配を採用して 2 次のヘッセ行列を近似します。これにより、2 つのヤコビ行列を内挿することで計算がより効率的になります。一方、より安定した勾配方向を取得するために、複数の例をランダムにサンプリングし、これらの例の勾配を平均して、反復プロセス中のランダムサンプリングによる分散を削減します。 ImageNet 互換のデータセットに関する広範な実験結果は、提案された方法が平坦な局所領域で敵対的な例を生成でき、通常に訓練されたモデルまたは敵対的に訓練されたモデルのいずれかで敵対的な転送可能性を最先端の攻撃よりも大幅に改善できることを示しています。

Transfer-based attack adopts the adversarial examples generated on the surrogate model to attack various models, making it applicable in the physical world and attracting increasing interest. Recently, various adversarial attacks have emerged to boost adversarial transferability from different perspectives. In this work, inspired by the fact that flat local minima are correlated with good generalization, we assume and empirically validate that adversarial examples at a flat local region tend to have good transferability by introducing a penalized gradient norm to the original loss function. Since directly optimizing the gradient regularization norm is computationally expensive and intractable for generating adversarial examples, we propose an approximation optimization method to simplify the gradient update of the objective function. Specifically, we randomly sample an example and adopt the first-order gradient to approximate the second-order Hessian matrix, which makes computing more efficient by interpolating two Jacobian matrices. Meanwhile, in order to obtain a more stable gradient direction, we randomly sample multiple examples and average the gradients of these examples to reduce the variance due to random sampling during the iterative process. Extensive experimental results on the ImageNet-compatible dataset show that the proposed method can generate adversarial examples at flat local regions, and significantly improve the adversarial transferability on either normally trained models or adversarially trained models than the state-of-the-art attacks.

updated: Thu Jun 08 2023 14:21:02 GMT+0000 (UTC)

published: Thu Jun 08 2023 14:21:02 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト