Pre-trained Adversarial Perturbations

Yuanhao Ban; Yinpeng Dong

事前訓練された敵対的摂動

自己監視型の事前トレーニングは、微調整後の多数のダウンストリームタスクで優れたパフォーマンスを発揮するため、近年ますます注目を集めています。ただし、深層学習モデルは敵対的な例に対する堅牢性に欠けていることはよく知られています。これは、あまり調査されていないにもかかわらず、事前トレーニング済みのモデルにセキュリティの問題を引き起こす可能性もあります.このホワイトペーパーでは、事前トレーニング済み敵対的摂動 (PAP) を導入することにより、事前トレーニング済みモデルのロバスト性を掘り下げます。下流のタスク。この目的のために、事前トレーニング済みモデルの低レベル層のニューロン活性化を持ち上げることによって効果的な PAP を生成する低レベル層リフティング攻撃 (L4A) メソッドを提案します。強化されたノイズ増強戦略を備えた L4A は、微調整されたモデルに対してより転送可能な PAP を生成するのに効果的です。典型的な事前トレーニング済みのビジョンモデルと 10 のダウンストリームタスクに関する広範な実験により、最先端の方法と比較して、私たちの方法が攻撃の成功率を大幅に向上させることが実証されています。

Self-supervised pre-training has drawn increasing attention in recent years due to its superior performance on numerous downstream tasks after fine-tuning. However, it is well-known that deep learning models lack the robustness to adversarial examples, which can also invoke security issues to pre-trained models, despite being less explored. In this paper, we delve into the robustness of pre-trained models by introducing Pre-trained Adversarial Perturbations (PAPs), which are universal perturbations crafted for the pre-trained models to maintain the effectiveness when attacking fine-tuned ones without any knowledge of the downstream tasks. To this end, we propose a Low-Level Layer Lifting Attack (L4A) method to generate effective PAPs by lifting the neuron activations of low-level layers of the pre-trained models. Equipped with an enhanced noise augmentation strategy, L4A is effective at generating more transferable PAPs against fine-tuned models. Extensive experiments on typical pre-trained vision models and ten downstream tasks demonstrate that our method improves the attack success rate by a large margin compared with state-of-the-art methods.

updated: Fri Oct 14 2022 12:37:24 GMT+0000 (UTC)

published: Fri Oct 07 2022 07:28:03 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト