SATBA: An Invisible Backdoor Attack Based On Spatial Attention

Huasong Zhou; Xiaowei Xu; Xiaodong Wang; Leon Bevan Bullock

SATBA: 空間的注意に基づく目に見えないバックドア攻撃

バックドア攻撃は、隠れたトリガーパターンに追加されたデータセットでディープニューラルネットワーク (DNN) がトレーニングされる AI セキュリティに新たな脅威をもたらします。汚染されたモデルは無害なサンプルでは正常に動作しますが、トリガーパターンを含むサンプルでは異常な結果を生成します。それにもかかわらず、ほとんどの既存のバックドア攻撃は 2 つの重大な欠点に直面しています。それらのトリガーパターンは目に見えて人間の検査で簡単に検出できます。また、インジェクションプロセスにより自然なサンプルの特徴とトリガーパターンが失われ、攻撃の成功率とモデルの精度が低下します。 .この論文では、空間的注意メカニズムと U 型モデルを使用してこれらの制限を克服する、SATBA という名前の新しいバックドア攻撃を提案します。私たちの攻撃は、空間的注意メカニズムを活用してデータの特徴を抽出し、クリーンなデータと相関する目に見えないトリガーパターンを生成します。次に、U 型モデルを使用して、顕著な特徴の損失を引き起こすことなく、これらのトリガーパターンを元のデータに植え付けます。 3 つの標準的なデータセットにまたがる 3 つの著名な画像分類 DNN に対する攻撃を評価し、高い攻撃成功率とバックドア防御に対する堅牢性を実現することを実証します。さらに、攻撃のステルス性を強調するために、画像の類似性に関する広範な実験も行っています。

Backdoor attacks pose a new and emerging threat to AI security, where Deep Neural Networks (DNNs) are trained on datasets added to hidden trigger patterns. Although the poisoned model behaves normally on benign samples, it produces anomalous results on samples containing the trigger pattern. Nevertheless, most existing backdoor attacks face two significant drawbacks: their trigger patterns are visible and easy to detect by human inspection, and their injection process leads to the loss of natural sample features and trigger patterns, thereby reducing the attack success rate and the model accuracy. In this paper, we propose a novel backdoor attack named SATBA that overcomes these limitations by using spatial attention mechanism and U-type model. Our attack leverages spatial attention mechanism to extract data features and generate invisible trigger patterns that are correlated with clean data. Then it uses U-type model to plant these trigger patterns into the original data without causing noticeable feature loss. We evaluate our attack on three prominent image classification DNNs across three standard datasets and demonstrate that it achieves high attack success rate and robustness against backdoor defenses. Additionally, we also conduct extensive experiments on image similarity to highlight the stealthiness of our attack.

updated: Sun Mar 26 2023 14:23:10 GMT+0000 (UTC)

published: Sat Feb 25 2023 10:57:41 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト