Towards Feature Space Adversarial Attack

Qiuling Xu; Guanhong Tao; Siyuan Cheng; Xiangyu Zhang

特徴空間の敵対的攻撃に向けて

画像分類のためにディープニューラルネットワークへの新しい敵対的攻撃を提案します。入力ピクセルを直接混乱させるほとんどの既存の攻撃とは異なり、私たちの攻撃は抽象的な特徴、より具体的には、鮮やかな色やシャープな輪郭などの解釈可能なスタイルや解釈できないスタイルを含むスタイルを表す特徴に焦点を当てています。最適化手順を通じて知覚できないスタイルの変更を注入することにより、モデルの誤分類を引き起こします。私たちの攻撃は、最先端の無制限の攻撃よりも自然に見える敵対的なサンプルを生成できることを示しています。この実験は、既存のピクセル空間の敵対的攻撃の検出および防御技術では、スタイルに関連する特徴空間の堅牢性をほとんど保証できないこともサポートしています。

We propose a new adversarial attack to Deep Neural Networks for image classification. Different from most existing attacks that directly perturb input pixels, our attack focuses on perturbing abstract features, more specifically, features that denote styles, including interpretable styles such as vivid colors and sharp outlines, and uninterpretable ones. It induces model misclassfication by injecting imperceptible style changes through an optimization procedure. We show that our attack can generate adversarial samples that are more natural-looking than the state-of-the-art unbounded attacks. The experiment also supports that existing pixel-space adversarial attack detection and defense techniques can hardly ensure robustness in the style related feature space.

updated: Wed Dec 16 2020 03:47:44 GMT+0000 (UTC)

published: Sun Apr 26 2020 13:56:31 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト