Improving the Transferability of Adversarial Examples with Arbitrary Style Transfer

Zhijin Ge; Fanhua Shang; Hongying Liu; Yuanyuan Liu; Liang Wan; Wei Feng; Xiaosen Wang

任意のスタイル転送による敵対的サンプルの転送可能性の向上

ディープニューラルネットワークは、クリーンな入力に人間には知覚できない摂動を適用することによって作成された敵対的な例に対して脆弱です。多くの攻撃手法は、ホワイトボックス設定では高い成功率を達成できますが、ブラックボックス設定では弱い伝達性も示します。最近、敵対的転送可能性を向上させるためにさまざまな方法が提案されていますが、入力変換は最も効果的な方法の 1 つです。この研究では、既存の入力変換ベースの研究が主に同じドメイン内の変換されたデータを拡張に採用していることに気づきました。ドメインの一般化に触発され、さまざまなドメインから強化されたデータを使用して転送可能性をさらに向上させることを目指しています。具体的には、スタイル転送ネットワークは、人間にとっての意味論的なコンテンツを維持しながら、画像内の低レベルの視覚的特徴の分布を変更できます。そこで、提案された任意のスタイル転送ネットワークを利用して画像を異なるドメインに変換する、スタイル転送方式 (STM) と呼ばれる新しい攻撃方法を提案します。分類ネットワークの様式化された画像の意味情報の不一致を避けるために、スタイル転送ネットワークを微調整し、ランダムノイズを追加して生成された画像を元の画像と混合して、意味の一貫性を維持し、入力の多様性を高めます。 ImageNet 互換データセットに関する広範な実験結果は、私たちが提案した方法が、通常にトレーニングされたモデルまたは敵対的にトレーニングされたモデルのいずれかにおいて、最先端の入力変換ベースの攻撃よりも敵対的転送可能性を大幅に向上できることを示しています。コードは https://github.com/Zhijin-Ge/STM から入手できます。

Deep neural networks are vulnerable to adversarial examples crafted by applying human-imperceptible perturbations on clean inputs. Although many attack methods can achieve high success rates in the white-box setting, they also exhibit weak transferability in the black-box setting. Recently, various methods have been proposed to improve adversarial transferability, in which the input transformation is one of the most effective methods. In this work, we notice that existing input transformation-based works mainly adopt the transformed data in the same domain for augmentation. Inspired by domain generalization, we aim to further improve the transferability using the data augmented from different domains. Specifically, a style transfer network can alter the distribution of low-level visual features in an image while preserving semantic content for humans. Hence, we propose a novel attack method named Style Transfer Method (STM) that utilizes a proposed arbitrary style transfer network to transform the images into different domains. To avoid inconsistent semantic information of stylized images for the classification network, we fine-tune the style transfer network and mix up the generated images added by random noise with the original images to maintain semantic consistency and boost input diversity. Extensive experimental results on the ImageNet-compatible dataset show that our proposed method can significantly improve the adversarial transferability on either normally trained models or adversarially trained models than state-of-the-art input transformation-based attacks. Code is available at: https://github.com/Zhijin-Ge/STM.

updated: Mon Aug 21 2023 09:58:13 GMT+0000 (UTC)

published: Mon Aug 21 2023 09:58:13 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト