PoisonedEncoder: Poisoning the Unlabeled Pre-training Data in Contrastive Learning

Hongbin Liu; Jinyuan Jia; Neil Zhenqiang Gong

PoisonedEncoder：対照学習におけるラベルなしの事前トレーニングデータのポイズニング

対照学習では、大量のラベルなしデータを使用して画像エンコーダーを事前トレーニングし、画像エンコーダーをさまざまなダウンストリームタスクの汎用特徴抽出器として使用できるようにします。この作業では、対照学習に対するデータ中毒攻撃であるPoisonedEncoderを提案します。特に、攻撃者は慎重に作成されたポイズニング入力をラベルのない事前トレーニングデータに注入し、複数のターゲットダウンストリームタスク用のポイズンエンコーダに基づいて構築されたダウンストリーム分類器が、攻撃者が選択した任意のクリーンな入力を攻撃者が選択した任意のクラスとして同時に分類します。。データ中毒攻撃を2レベルの最適化問題として定式化し、その解決策は中毒入力のセットです。そして、それを近似的に解決するために、対照的な学習に合わせた方法を提案します。複数のデータセットに対する私たちの評価は、PoisonedEncoderが、攻撃者が選択していない入力に対して、poisonedエンコーダーに基づいて構築されたダウンストリーム分類器のテスト精度を維持しながら、高い攻撃成功率を達成することを示しています。また、PoisonedEncoderに対する5つの防御を評価します。これには、1つの前処理、3つの処理中、および1つの後処理の防御が含まれます。私たちの結果は、これらの防御がPoisonedEncoderの攻撃成功率を低下させる可能性があることを示していますが、エンコーダーの有用性を犠牲にするか、大規模でクリーンな事前トレーニングデータセットを必要とします。

Contrastive learning pre-trains an image encoder using a large amount of unlabeled data such that the image encoder can be used as a general-purpose feature extractor for various downstream tasks. In this work, we propose PoisonedEncoder, a data poisoning attack to contrastive learning. In particular, an attacker injects carefully crafted poisoning inputs into the unlabeled pre-training data, such that the downstream classifiers built based on the poisoned encoder for multiple target downstream tasks simultaneously classify attacker-chosen, arbitrary clean inputs as attacker-chosen, arbitrary classes. We formulate our data poisoning attack as a bilevel optimization problem, whose solution is the set of poisoning inputs; and we propose a contrastive-learning-tailored method to approximately solve it. Our evaluation on multiple datasets shows that PoisonedEncoder achieves high attack success rates while maintaining the testing accuracy of the downstream classifiers built upon the poisoned encoder for non-attacker-chosen inputs. We also evaluate five defenses against PoisonedEncoder, including one pre-processing, three in-processing, and one post-processing defenses. Our results show that these defenses can decrease the attack success rate of PoisonedEncoder, but they also sacrifice the utility of the encoder or require a large clean pre-training dataset.

updated: Fri May 13 2022 00:15:44 GMT+0000 (UTC)

published: Fri May 13 2022 00:15:44 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト