UMD: Unsupervised Model Detection for X2X Backdoor Attacks

Zhen Xiang; Zidi Xiong; Bo Li

UMD: X2X バックドア攻撃の教師なしモデルの検出

バックドア (トロイの木馬) 攻撃は、ディープニューラルネットワークに対する一般的な脅威であり、バックドアトリガーが埋め込まれた 1 つ以上のソースクラスからのサンプルが、敵対的なターゲットクラスに誤って分類されてしまいます。分類器がバックドア攻撃を受けているかどうかを検出するための既存の方法は、ほとんどが単一の敵対的ターゲットによる攻撃 (たとえば、全対一攻撃) 向けに設計されています。私たちの知る限り、監視なしでは、任意の数のソースクラス (それぞれが任意のターゲットクラスとペアになっている) による、より一般的な X2X 攻撃に効果的に対処できる既存の方法はありません。この論文では、敵対的 (ソース、ターゲット) クラスペアの共同推論を介して X2X バックドア攻撃を効果的に検出する初の教師なしモデル検出手法である UMD を提案します。特に、提案されたクラスタリングアプローチに基づいて、推定上のバックドアクラスペアのサブセットを測定および選択するための新しい転送可能性統計を最初に定義します。次に、これらの選択されたクラスペアは、私たちが提案した堅牢な教師なし異常検出器を使用して、検出推論のためのリバースエンジニアリングされたトリガーサイズの集計に基づいて共同で評価されます。 CIFAR-10、GTSRB、Imagenette データセットの包括的な評価を実施し、教師なし UMD が、さまざまなデータセットに対する検出精度の点で、SOTA 検出器 (監視ありでも) をそれぞれ 17%、4%、8% 上回っていることを示しています。 X2X攻撃。また、いくつかの強力な適応型攻撃に対する UMD の強力な検出パフォーマンスも示します。

Backdoor (Trojan) attack is a common threat to deep neural networks, where samples from one or more source classes embedded with a backdoor trigger will be misclassified to adversarial target classes. Existing methods for detecting whether a classifier is backdoor attacked are mostly designed for attacks with a single adversarial target (e.g., all-to-one attack). To the best of our knowledge, without supervision, no existing methods can effectively address the more general X2X attack with an arbitrary number of source classes, each paired with an arbitrary target class. In this paper, we propose UMD, the first Unsupervised Model Detection method that effectively detects X2X backdoor attacks via a joint inference of the adversarial (source, target) class pairs. In particular, we first define a novel transferability statistic to measure and select a subset of putative backdoor class pairs based on a proposed clustering approach. Then, these selected class pairs are jointly assessed based on an aggregation of their reverse-engineered trigger size for detection inference, using a robust and unsupervised anomaly detector we proposed. We conduct comprehensive evaluations on CIFAR-10, GTSRB, and Imagenette dataset, and show that our unsupervised UMD outperforms SOTA detectors (even with supervision) by 17%, 4%, and 8%, respectively, in terms of the detection accuracy against diverse X2X attacks. We also show the strong detection performance of UMD against several strong adaptive attacks.

updated: Wed Nov 15 2023 21:51:23 GMT+0000 (UTC)

published: Mon May 29 2023 23:06:05 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト