Poison as a Cure: Detecting & Neutralizing Variable-Sized Backdoor Attacks in Deep Neural Networks

Alvin Chan; Yew-Soon Ong

治療法としての毒：ディープニューラルネットワークにおける可変サイズのバックドア攻撃の検出と無効化

ディープラーニングモデルは最近、バックドアポイズニング（被害者モデルがクリーンな画像を正しく予測しますが、トリガーポイズンパターンが追加されたときにターゲットクラスと同じ画像を分類する潜行攻撃）に対して脆弱であることが示されていますこのポイズンパターンは、攻撃者がトレーニングデータセットに埋め込むことができます。既存の防御は、小さなサイズの毒パターン、毒されたトレーニングサンプルの比率に関する知識、または検証済みのクリーンなデータセットが利用可能な場合などの特定の条件下で効果的です。防御者にはそのような事前の知識やリソースがない場合があるため、これらの前提条件が満たされていない場合でも有効なバックドアポイズニングに対する防御を提案します。いくつかの部分で構成されています。1つはバックドアポイズンシグナルを抽出し、ポイズンターゲットとベースクラスを検出し、実証済みの保証付きでクリーンサンプルからポイズニングを除去します。防御の最後の部分では、抽出された毒信号で強化されたデータセットで毒モデルを再トレーニングし、毒サンプルの修正再ラベル付けを行ってバックドアを無効にします。私たちのアプローチは、CIFAR10データセットの9つの異なるターゲットベースクラスペアで、大小両方の毒物パターンを使用するバックドア攻撃に対する防御に効果的であることが示されています。

Deep learning models have recently shown to be vulnerable to backdoor poisoning, an insidious attack where the victim model predicts clean images correctly but classifies the same images as the target class when a trigger poison pattern is added. This poison pattern can be embedded in the training dataset by the adversary. Existing defenses are effective under certain conditions such as a small size of the poison pattern, knowledge about the ratio of poisoned training samples or when a validated clean dataset is available. Since a defender may not have such prior knowledge or resources, we propose a defense against backdoor poisoning that is effective even when those prerequisites are not met. It is made up of several parts: one to extract a backdoor poison signal, detect poison target and base classes, and filter out poisoned from clean samples with proven guarantees. The final part of our defense involves retraining the poisoned model on a dataset augmented with the extracted poison signal and corrective relabeling of poisoned samples to neutralize the backdoor. Our approach has shown to be effective in defending against backdoor attacks that use both small and large-sized poison patterns on nine different target-base class pairs from the CIFAR10 dataset.

updated: Tue Nov 19 2019 01:59:59 GMT+0000 (UTC)

published: Tue Nov 19 2019 01:59:59 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト