Improving the Robustness of Adversarial Attacks Using an Affine-Invariant Gradient Estimator

Wenzhao Xiang; Hang Su; Chang Liu; Yandong Guo; Shibao Zheng

アフィン不変勾配推定量を使用した敵対的攻撃のロバスト性の改善

人工知能の設計者がハッカーの裏をかくことを試みるとき、双方はAIの固有の脆弱性に焦点を合わせ続けます。データの特定の統計的分布から設計およびトレーニングされたAIのディープニューラルネットワーク（DNN）は、DNNの統計的、予測的仮定に違反する欺瞞的な入力に対して脆弱なままです。ただし、ニューラルネットワークに入力される前は、既存の敵対的な例のほとんどは、アフィン変換に適用されたときに悪意のある機能を維持できません。実用的な目的では、悪意のある機能を維持することは、敵対的な攻撃の堅牢性の重要な尺度として機能します。 DNNが攻撃からより完全に防御することを学ぶのを助けるために、アフィン不変の敵対的攻撃を提案します。これは、アフィン変換に対してより堅牢な敵対的例を一貫して生成できます。効率を上げるために、現在のアフィン変換戦略を、その幾何学的な並進、回転、および拡張を伴うユークリッド幾何座標平面から解きほぐすことを提案します。後者の2つを極座標で再定式化します。その後、元の画像の勾配を派生カーネルで畳み込むことにより、アフィン不変勾配推定量を構築します。これは、勾配ベースの攻撃方法と統合できます。物理的条件下でのいくつかの実験を含むImageNetでの広範な実験は、私たちの方法が、代替の最先端の方法と比較して、敵対的な例のアフィン不変性を大幅に改善し、副産物として、敵対的な例の転送可能性を改善できることを示しています。

As designers of artificial intelligence try to outwit hackers, both sides continue to hone in on AI's inherent vulnerabilities. Designed and trained from certain statistical distributions of data, AI's deep neural networks (DNNs) remain vulnerable to deceptive inputs that violate a DNN's statistical, predictive assumptions. Before being fed into a neural network, however, most existing adversarial examples cannot maintain malicious functionality when applied to an affine transformation. For practical purposes, maintaining that malicious functionality serves as an important measure of the robustness of adversarial attacks. To help DNNs learn to defend themselves more thoroughly against attacks, we propose an affine-invariant adversarial attack, which can consistently produce more robust adversarial examples over affine transformations. For efficiency, we propose to disentangle current affine-transformation strategies from the Euclidean geometry coordinate plane with its geometric translations, rotations and dilations; we reformulate the latter two in polar coordinates. Afterwards, we construct an affine-invariant gradient estimator by convolving the gradient at the original image with derived kernels, which can be integrated with any gradient-based attack methods. Extensive experiments on ImageNet, including some experiments under physical condition, demonstrate that our method can significantly improve the affine invariance of adversarial examples and, as a byproduct, improve the transferability of adversarial examples, compared with alternative state-of-the-art methods.

updated: Fri Apr 22 2022 07:06:17 GMT+0000 (UTC)

published: Mon Sep 13 2021 09:43:17 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト