ESTAS: Effective and Stable Trojan Attacks in Self-supervised Encoders with One Target Unlabelled Sample

Jiaqi Xue; Qian Lou

ESTAS: ラベル付けされていない 1 つのターゲットサンプルを使用した自己管理型エンコーダーでの効果的で安定したトロイの木馬攻撃

新たに登場した自己教師あり学習 (SSL) は、ラベル付きデータへの依存を回避し、大規模でユビキタスなラベルなしデータから豊富な表現を学習するための一般的な画像表現エンコーディング方法になりました。次に、事前にトレーニングされた SSL 画像エンコーダーの上にあるダウンストリーム分類器を、ラベル付けされたダウンストリームデータをほとんどまたはまったく使用せずにトレーニングできます。 SSL がさまざまなダウンストリームタスクで優れた競争力のあるパフォーマンスを達成したことは広範な研究によって示されていますが、SSL エンコーダーでのトロイの木馬攻撃などのセキュリティ上の懸念はまだ十分に研究されていません。この作業では、ESTAS と呼ばれる新しいトロイの木馬攻撃方法を提示します。これは、ターゲットのラベルのないサンプルを 1 つだけ使用して、SSL エンコーダーで効果的かつ安定した攻撃を可能にします。特に、ESTAS での一貫したトリガーポイズニングとカスケード最適化を提案して、攻撃の有効性とモデルの精度を向上させ、大規模な無秩序でラベルのないデータからの高価なターゲットクラスのデータサンプル抽出を排除します。複数のデータセットに対する私たちの実質的な実験は、ESTAS が 1 つのターゲットクラスのサンプルで > 99% の攻撃成功率 (ASR) を安定して達成することを示しています。以前の研究と比較して、ESTAS は平均で 30% を超える ASR の増加と 8.3% を超える精度の向上を達成しています。

Emerging self-supervised learning (SSL) has become a popular image representation encoding method to obviate the reliance on labeled data and learn rich representations from large-scale, ubiquitous unlabelled data. Then one can train a downstream classifier on top of the pre-trained SSL image encoder with few or no labeled downstream data. Although extensive works show that SSL has achieved remarkable and competitive performance on different downstream tasks, its security concerns, e.g, Trojan attacks in SSL encoders, are still not well-studied. In this work, we present a novel Trojan Attack method, denoted by ESTAS, that can enable an effective and stable attack in SSL encoders with only one target unlabeled sample. In particular, we propose consistent trigger poisoning and cascade optimization in ESTAS to improve attack efficacy and model accuracy, and eliminate the expensive target-class data sample extraction from large-scale disordered unlabelled data. Our substantial experiments on multiple datasets show that ESTAS stably achieves > 99% attacks success rate (ASR) with one target-class sample. Compared to prior works, ESTAS attains > 30% ASR increase and > 8.3% accuracy improvement on average.

updated: Sun Nov 20 2022 08:58:34 GMT+0000 (UTC)

published: Sun Nov 20 2022 08:58:34 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト