An Embarrassingly Simple Backdoor Attack on Self-supervised Learning

Changjiang Li; Ren Pang; Zhaohan Xi; Tianyu Du; Shouling Ji; Yuan Yao; Ting Wang

自己教師あり学習に対する恥ずかしいほど単純なバックドア攻撃

機械学習の新しいパラダイムである自己教師あり学習 (SSL) は、ラベルに依存せずに複雑なデータの高品質表現を学習できます。研究では、ラベル付きデータの必要性を排除することに加えて、ラベルがないと敵対者がモデル予測を操作することがより困難になるため、SSL は教師あり学習よりも敵対的な堅牢性を向上させることがわかっています。ただし、この堅牢性の優位性が他のタイプの攻撃にどの程度まで一般化するかは未解決の問題のままです。この疑問をバックドア攻撃の文脈で検討します。具体的には、恥ずかしいほど単純だが非常に効果的な自己監視型バックドア攻撃である CTRL を設計し、評価します。 CTRL は、トレーニングデータのほんの一部 (<= 1%) を区別できないポイズニングサンプルで汚染するだけで、トリガーに埋め込まれた入力を推論時に高い確率 (>= 99%) で攻撃者の指定クラスに誤分類します。私たちの調査結果は、SSL と教師あり学習がバックドア攻撃に対して比較的脆弱であることを示唆しています。さらに重要なのは、CTRL のレンズを通して、バックドア攻撃に対する SSL の固有の脆弱性を研究していることです。経験的証拠と分析的証拠の両方により、SSL の表現不変特性は敵対的な堅牢性にメリットをもたらしますが、それが \ssl をバックドア攻撃に対して非常に脆弱にするまさにその理由でもある可能性があることを明らかにしました。私たちの調査結果は、監視型バックドア攻撃に対する既存の防御策が SSL の固有の脆弱性に簡単に改造できないことも示唆しています。

As a new paradigm in machine learning, self-supervised learning (SSL) is capable of learning high-quality representations of complex data without relying on labels. In addition to eliminating the need for labeled data, research has found that SSL improves the adversarial robustness over supervised learning since lacking labels makes it more challenging for adversaries to manipulate model predictions. However, the extent to which this robustness superiority generalizes to other types of attacks remains an open question. We explore this question in the context of backdoor attacks. Specifically, we design and evaluate CTRL, an embarrassingly simple yet highly effective self-supervised backdoor attack. By only polluting a tiny fraction of training data (<= 1%) with indistinguishable poisoning samples, CTRL causes any trigger-embedded input to be misclassified to the adversary's designated class with a high probability (>= 99%) at inference time. Our findings suggest that SSL and supervised learning are comparably vulnerable to backdoor attacks. More importantly, through the lens of CTRL, we study the inherent vulnerability of SSL to backdoor attacks. With both empirical and analytical evidence, we reveal that the representation invariance property of SSL, which benefits adversarial robustness, may also be the very reason making \ssl highly susceptible to backdoor attacks. Our findings also imply that the existing defenses against supervised backdoor attacks are not easily retrofitted to the unique vulnerability of SSL.

updated: Mon Aug 14 2023 01:07:38 GMT+0000 (UTC)

published: Thu Oct 13 2022 20:39:21 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト