COLLIDER: A Robust Training Framework for Backdoor Data

Hadi M. Dolatabadi; Sarah Erfani; Christopher Leckie

COLLIDER: バックドアデータの堅牢なトレーニングフレームワーク

ディープニューラルネットワーク (DNN) 分類器は、バックドア攻撃に対して脆弱です。攻撃者は、トリガーをインストールすることで、このような攻撃のトレーニングデータの一部を汚染します。目標は、クリーンなデータに対して通常どおり実行しながら、トリガーがアクティブ化されるたびに、トレーニング済みの DNN が攻撃者の目的のクラスを出力するようにすることです。悪意のあるバックドア DNN を検出するために、さまざまなアプローチが最近提案されています。ただし、敵対的トレーニングのような堅牢なエンドツーエンドのトレーニングアプローチは、バックドアポイズニングされたデータに対してまだ発見されていません。この論文では、データの基礎となる幾何学的構造を利用して最も顕著なサンプルを選択する堅牢なトレーニングフレームワーク COLLIDER を開発することにより、そのような方法に向けた第一歩を踏み出します。具体的には、幾何学的コアセット選択目標を解決することにより、各トレーニングエポックで汚染されたデータの候補を効果的に除外します。最初に、クリーンなデータサンプルが (1) クリーンな大部分のデータと同様の勾配を示すこと、および (2) ローカル固有次元 (LID) が低いことについて議論します。これらの基準に基づいて、DNN のトレーニングに使用されるサンプルを見つけるための新しいコアセット選択目標を定義します。さまざまな汚染されたデータセットでの DNN の堅牢なトレーニングに対する提案された方法の有効性を示し、バックドアの成功率を大幅に削減します。

Deep neural network (DNN) classifiers are vulnerable to backdoor attacks. An adversary poisons some of the training data in such attacks by installing a trigger. The goal is to make the trained DNN output the attacker's desired class whenever the trigger is activated while performing as usual for clean data. Various approaches have recently been proposed to detect malicious backdoored DNNs. However, a robust, end-to-end training approach, like adversarial training, is yet to be discovered for backdoor poisoned data. In this paper, we take the first step toward such methods by developing a robust training framework, COLLIDER, that selects the most prominent samples by exploiting the underlying geometric structures of the data. Specifically, we effectively filter out candidate poisoned data at each training epoch by solving a geometrical coreset selection objective. We first argue how clean data samples exhibit (1) gradients similar to the clean majority of data and (2) low local intrinsic dimensionality (LID). Based on these criteria, we define a novel coreset selection objective to find such samples, which are used for training a DNN. We show the effectiveness of the proposed method for robust training of DNNs on various poisoned datasets, reducing the backdoor success rate significantly.

updated: Thu Oct 13 2022 03:48:46 GMT+0000 (UTC)

published: Thu Oct 13 2022 03:48:46 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト