Efficient Backdoor Attacks for Deep Neural Networks in Real-world Scenarios

Hong Sun; Ziqiang Li; Pengfei Xia; Heng Li; Beihao Xia; Yi Wu; Bin Li

現実世界のシナリオにおけるディープニューラルネットワークに対する効率的なバックドア攻撃

最近のディープニューラルネットワーク (DNN) は膨大な量のトレーニングデータに依存するようになり、悪意のある攻撃者がデータを悪用および汚染してバックドア攻撃を実行する機会を与えています。これらの攻撃は、DNN の信頼性を大きく損ないます。しかし、既存のバックドア攻撃手法では、すべてのトレーニングデータが単一のソースから取得され、攻撃者がトレーニングデータに完全にアクセスできると仮定して、非現実的な仮定を立てています。このペーパーでは、被害者が複数のソースからデータを収集し、攻撃者が完全なトレーニングデータにアクセスできないという、より現実的な攻撃シナリオを導入することで、この制限に対処します。このシナリオをデータ制約型バックドア攻撃と呼びます。このような場合、以前の攻撃方法では、バックドアインジェクションプロセス中に無害な機能とポイズニング機能が絡み合うため、効率が大幅に低下します。この問題に取り組むために、事前にトレーニングされた Contrastive Language-Image Pre-Training (CLIP) モデルを活用する新しいアプローチを提案します。我々は、2 つの異なる流れから 3 つの CLIP ベースのテクノロジーを導入します。1 つはクリーンフィーチャの影響を抑制してポイズニングフィーチャの顕著性を高めることを目的としたクリーンフィーチャ抑制、もう 1 つはポイズニングフィーチャの存在と影響を増強することに焦点を当てたポイズニングフィーチャオーグメンテーションです。モデルの動作を効果的に操作します。私たちの手法の有効性、無害な精度、およびステルス性を評価するために、3 つのターゲットモデル、3 つのデータセット、および 15 以上の異なる設定で広範な実験を実施しました。結果は、データに制約のあるシナリオでの既存の攻撃と比較して、一部の設定では 100% 以上の改善を達成するなど、顕著な改善を示しています。私たちの研究は、既存の手法の限界に対処することに貢献し、データに制約のあるバックドア攻撃に対する実用的かつ効果的なソリューションを提供します。

Recent deep neural networks (DNNs) have come to rely on vast amounts of training data, providing an opportunity for malicious attackers to exploit and contaminate the data to carry out backdoor attacks. These attacks significantly undermine the reliability of DNNs. However, existing backdoor attack methods make unrealistic assumptions, assuming that all training data comes from a single source and that attackers have full access to the training data. In this paper, we address this limitation by introducing a more realistic attack scenario where victims collect data from multiple sources, and attackers cannot access the complete training data. We refer to this scenario as data-constrained backdoor attacks. In such cases, previous attack methods suffer from severe efficiency degradation due to the entanglement between benign and poisoning features during the backdoor injection process. To tackle this problem, we propose a novel approach that leverages the pre-trained Contrastive Language-Image Pre-Training (CLIP) model. We introduce three CLIP-based technologies from two distinct streams: Clean Feature Suppression, which aims to suppress the influence of clean features to enhance the prominence of poisoning features, and Poisoning Feature Augmentation, which focuses on augmenting the presence and impact of poisoning features to effectively manipulate the model's behavior. To evaluate the effectiveness, harmlessness to benign accuracy, and stealthiness of our method, we conduct extensive experiments on 3 target models, 3 datasets, and over 15 different settings. The results demonstrate remarkable improvements, with some settings achieving over 100% improvement compared to existing attacks in data-constrained scenarios. Our research contributes to addressing the limitations of existing methods and provides a practical and effective solution for data-constrained backdoor attacks.

updated: Wed Jun 14 2023 09:21:48 GMT+0000 (UTC)

published: Wed Jun 14 2023 09:21:48 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト