Improving the Transferability of Adversarial Examples via Direction Tuning

Xiangyuan Yang; Jie Lin; Hanlin Zhang; Xinyu Yang; Peng Zhao

方向調整による敵対的な例の転送可能性の改善

転送ベースの敵対的攻撃では、敵対的例は代理モデルによってのみ生成され、被害者モデルで効果的な摂動を達成します。転送ベースの敵対的攻撃によって生成された敵対的な例の転送可能性を改善するためにかなりの努力が開発されてきましたが、私たちの調査では、現在の転送ベースの敵対的攻撃の実際の更新方向と最も急な更新方向との間の大きな偏差は、大規模な更新によって引き起こされることがわかりました。ステップの長さ、生成された敵対的な例はうまく収束できません。ただし、更新ステップの長さを直接減らすと、深刻な更新振動が発生するため、生成された敵対的な例も被害者モデルへの優れた伝達性を達成できません。これらの問題に対処するために、新しい転送ベースの攻撃、つまり方向調整攻撃が提案され、大きなステップ長での更新偏差を減らすだけでなく、小さなサンプリングステップ長での更新振動を緩和し、それによって生成された敵対的なものを作ります。例はよく収束して、被害者モデルで優れた転送可能性を実現します。さらに、決定境界を平滑化するためのネットワーク枝刈り法が提案され、それによって更新振動がさらに減少し、生成された敵対的例の転送可能性が強化されます。 ImageNet での実験結果は、私たちの方法によって生成された敵対的サンプルの平均攻撃成功率 (ASR) が、防御のない 5 つの被害者モデルで 87.9% から 94.5% に、8 つの高度な防御で 69.1% から 76.2% に改善できることを示しています。最新の勾配ベースの攻撃の方法と比較して。

In the transfer-based adversarial attacks, adversarial examples are only generated by the surrogate models and achieve effective perturbation in the victim models. Although considerable efforts have been developed on improving the transferability of adversarial examples generated by transfer-based adversarial attacks, our investigation found that, the big deviation between the actual and steepest update directions of the current transfer-based adversarial attacks is caused by the large update step length, resulting in the generated adversarial examples can not converge well. However, directly reducing the update step length will lead to serious update oscillation so that the generated adversarial examples also can not achieve great transferability to the victim models. To address these issues, a novel transfer-based attack, namely direction tuning attack, is proposed to not only decrease the update deviation in the large step length, but also mitigate the update oscillation in the small sampling step length, thereby making the generated adversarial examples converge well to achieve great transferability on victim models. In addition, a network pruning method is proposed to smooth the decision boundary, thereby further decreasing the update oscillation and enhancing the transferability of the generated adversarial examples. The experiment results on ImageNet demonstrate that the average attack success rate (ASR) of the adversarial examples generated by our method can be improved from 87.9% to 94.5% on five victim models without defenses, and from 69.1% to 76.2% on eight advanced defense methods, in comparison with that of latest gradient-based attacks.

updated: Mon Mar 27 2023 11:26:34 GMT+0000 (UTC)

published: Mon Mar 27 2023 11:26:34 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト