BadEncoder: Backdoor Attacks to Pre-trained Encoders in Self-Supervised Learning

Jinyuan Jia; Yupei Liu; Neil Zhenqiang Gong

BadEncoder：自己教師あり学習で事前にトレーニングされたエンコーダーへのバックドア攻撃

コンピュータビジョンの自己教師あり学習は、ラベルのない大量の画像または（画像、テキスト）ペアを使用して画像エンコーダを事前トレーニングすることを目的としています。次に、事前にトレーニングされた画像エンコーダーを特徴抽出器として使用して、ラベル付けされたトレーニングデータが少量またはまったくない多くのダウンストリームタスクのダウンストリーム分類器を構築できます。この作業では、自己教師あり学習への最初のバックドア攻撃であるBadEncoderを提案します。特に、BadEncoderは、事前にトレーニングされたイメージエンコーダーにバックドアを挿入し、さまざまなダウンストリームタスク用にバックドアイメージエンコーダーに基づいて構築されたダウンストリーム分類子がバックドアの動作を同時に継承するようにします。 BadEncoderを最適化問題として定式化し、それを解決するための勾配降下ベースの方法を提案します。これにより、クリーンなものからバックドアイメージエンコーダーが生成されます。複数のデータセットに対する広範な経験的評価結果は、BadEncoderがダウンストリーム分類器の精度を維持しながら高い攻撃成功率を達成していることを示しています。また、2つの公開されている実際の画像エンコーダーを使用したBadEncoderの有効性も示します。つまり、ImageNetで事前トレーニングされたGoogleの画像エンコーダーと4億（画像、テキスト）インターネットから収集されたペア。さらに、Neural CleanseやMNTD（経験的防御）、PatchGuard（証明可能な防御）などの防御も検討します。私たちの結果は、これらの防御がBadEncoderに対して防御するには不十分であることを示しており、BadEncoderに対する新しい防御の必要性を浮き彫りにしています。私たちのコードはhttps://github.com/jjy1994/BadEncoderで公開されています。

Self-supervised learning in computer vision aims to pre-train an image encoder using a large amount of unlabeled images or (image, text) pairs. The pre-trained image encoder can then be used as a feature extractor to build downstream classifiers for many downstream tasks with a small amount of or no labeled training data. In this work, we propose BadEncoder, the first backdoor attack to self-supervised learning. In particular, our BadEncoder injects backdoors into a pre-trained image encoder such that the downstream classifiers built based on the backdoored image encoder for different downstream tasks simultaneously inherit the backdoor behavior. We formulate our BadEncoder as an optimization problem and we propose a gradient descent based method to solve it, which produces a backdoored image encoder from a clean one. Our extensive empirical evaluation results on multiple datasets show that our BadEncoder achieves high attack success rates while preserving the accuracy of the downstream classifiers. We also show the effectiveness of BadEncoder using two publicly available, real-world image encoders, i.e., Google's image encoder pre-trained on ImageNet and OpenAI's Contrastive Language-Image Pre-training (CLIP) image encoder pre-trained on 400 million (image, text) pairs collected from the Internet. Moreover, we consider defenses including Neural Cleanse and MNTD (empirical defenses) as well as PatchGuard (a provable defense). Our results show that these defenses are insufficient to defend against BadEncoder, highlighting the needs for new defenses against our BadEncoder. Our code is publicly available at: https://github.com/jjy1994/BadEncoder.

updated: Sun Aug 01 2021 02:22:31 GMT+0000 (UTC)

published: Sun Aug 01 2021 02:22:31 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト