Feature Importance-aware Transferable Adversarial Attacks

Zhibo Wang; Hengchang Guo; Zhifei Zhang; Wenxin Liu; Zhan Qin; Kui Ren

機能の重要性を意識した転送可能な敵対的攻撃

敵対的な例の転送可能性は、未知のモデルを攻撃するための中心的な重要性であり、ブラックボックス攻撃などのより実際的なシナリオでの敵対的な攻撃を容易にします。既存の転送可能な攻撃は、画像内のオブジェクトの固有の特徴を認識せずに、特徴を無差別に歪ませてソースモデルの予測精度を低下させることにより、敵対的な例を作成する傾向があります。このようなブルートフォースの低下は、モデル固有の局所最適を敵対的な例に導入し、転送可能性を制限すると主張します。対照的に、モデルの決定を一貫して支配する重要なオブジェクト認識機能を混乱させる機能重要性認識攻撃（FIA）を提案します。より具体的には、元のクリーンな画像のランダム変換のバッチで計算された、ソースモデルの特徴マップに関する勾配を平均化する集約勾配を導入することにより、特徴の重要性を取得します。勾配は対象のオブジェクトと高度に相関し、そのような相関は異なるモデル間で不変性を示します。さらに、ランダム変換はオブジェクトの固有の機能を保持し、モデル固有の情報を抑制します。最後に、機能の重要性は、重要な機能を破壊し、より強力な転送可能性を実現するための敵対的な例を検索するためのガイドです。広範な実験的評価により、提案されたFIAの有効性と優れたパフォーマンスが実証されています。つまり、最先端の転送可能な攻撃と比較して、成功率が通常のトレーニングモデルに対して8.4％、防御モデルに対して11.7％向上しています。コードはhttps://github.com/hcguoO0/FIAで入手できます。

Transferability of adversarial examples is of central importance for attacking an unknown model, which facilitates adversarial attacks in more practical scenarios, e.g., blackbox attacks. Existing transferable attacks tend to craft adversarial examples by indiscriminately distorting features to degrade prediction accuracy in a source model without aware of intrinsic features of objects in the images. We argue that such brute-force degradation would introduce model-specific local optimum into adversarial examples, thus limiting the transferability. By contrast, we propose the Feature Importance-aware Attack (FIA), which disrupts important object-aware features that dominate model decisions consistently. More specifically, we obtain feature importance by introducing the aggregate gradient, which averages the gradients with respect to feature maps of the source model, computed on a batch of random transforms of the original clean image. The gradients will be highly correlated to objects of interest, and such correlation presents invariance across different models. Besides, the random transforms will preserve intrinsic features of objects and suppress model-specific information. Finally, the feature importance guides to search for adversarial examples towards disrupting critical features, achieving stronger transferability. Extensive experimental evaluation demonstrates the effectiveness and superior performance of the proposed FIA, i.e., improving the success rate by 8.4% against normally trained models and 11.7% against defense models as compared to the state-of-the-art transferable attacks. Code is available at: https://github.com/hcguoO0/FIA

updated: Thu Jul 29 2021 17:13:29 GMT+0000 (UTC)

published: Thu Jul 29 2021 17:13:29 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト