Can Backdoor Attacks Survive Time-Varying Models?

Huiying Li; Arjun Nitin Bhagoji; Ben Y. Zhao; Haitao Zheng

バックドア攻撃は時変モデルを生き残ることができますか？

バックドアは、ディープニューラルネットワーク（DNN）に対する強力な攻撃です。攻撃者はトレーニングデータをポイズニングすることで、隠されたルール（バックドア）をDNNに挿入できます。これは、攻撃固有のトリガーを含む入力でのみアクティブになります。既存の作業では、さまざまなDNNモデルに対するバックドア攻撃が調査されていますが、静的モデルのみが考慮されており、初期展開後も変更されていません。このホワイトペーパーでは、時間とともに変化するDNNモデルのより現実的なシナリオに対するバックドア攻撃の影響を調査します。このシナリオでは、モデルの重みが定期的に更新され、時間の経過に伴うデータ分散のドリフトが処理されます。具体的には、モデルの更新に対するバックドアの「存続可能性」を経験的に定量化し、攻撃パラメータ、データドリフト動作、およびモデル更新戦略がバックドアの存続可能性にどのように影響するかを調べます。私たちの結果は、攻撃者がトリガーサイズと毒の比率を積極的に増やした場合でも、ワンショットバックドア攻撃（つまり、トレーニングデータを1回だけ中毒する）は、いくつかのモデルの更新を過ぎても生き残れないことを示しています。モデルの更新による影響を受けないようにするには、攻撃者は破損したデータをトレーニングパイプラインに継続的に導入する必要があります。総合すると、これらの結果は、新しいデータを学習するためにモデルが更新されると、バックドアを隠された悪意のある機能として「忘れる」ことを示しています。古いトレーニングデータと新しいトレーニングデータの間の分布のシフトが大きいほど、バックドアの忘れが早くなります。これらの洞察を活用して、スマートラーニングレートスケジューラを適用して、モデル更新中のバックドアの忘却をさらに加速します。これにより、ワンショットバックドアが単一のモデル更新を超えて存続するのを防ぎます。

Backdoors are powerful attacks against deep neural networks (DNNs). By poisoning training data, attackers can inject hidden rules (backdoors) into DNNs, which only activate on inputs containing attack-specific triggers. While existing work has studied backdoor attacks on a variety of DNN models, they only consider static models, which remain unchanged after initial deployment. In this paper, we study the impact of backdoor attacks on a more realistic scenario of time-varying DNN models, where model weights are updated periodically to handle drifts in data distribution over time. Specifically, we empirically quantify the "survivability" of a backdoor against model updates, and examine how attack parameters, data drift behaviors, and model update strategies affect backdoor survivability. Our results show that one-shot backdoor attacks (i.e., only poisoning training data once) do not survive past a few model updates, even when attackers aggressively increase trigger size and poison ratio. To stay unaffected by model update, attackers must continuously introduce corrupted data into the training pipeline. Together, these results indicate that when models are updated to learn new data, they also "forget" backdoors as hidden, malicious features. The larger the distribution shift between old and new training data, the faster backdoors are forgotten. Leveraging these insights, we apply a smart learning rate scheduler to further accelerate backdoor forgetting during model updates, which prevents one-shot backdoors from surviving past a single model update.

updated: Wed Jun 08 2022 01:32:49 GMT+0000 (UTC)

published: Wed Jun 08 2022 01:32:49 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト