Red Alarm for Pre-trained Models: Universal Vulnerabilities by Neuron-Level Backdoor Attacks

Zhengyan Zhang; Guangxuan Xiao; Yongwei Li; Tian Lv; Fanchao Qi; Zhiyuan Liu; Yasheng Wang; Xin Jiang; Maosong Sun

事前トレーニング済みモデルのレッドアラーム：ニューロンレベルのバックドア攻撃による普遍的な脆弱性

事前トレーニング済みモデル（PTM）の成功により、人々は通常、ダウンストリームタスク用に既存のPTMを微調整します。ほとんどのPTMはオープンソースによって提供および保守されており、バックドア攻撃に苦しむ可能性があります。この作業では、PTMの普遍的な脆弱性を示します。この脆弱性では、ダウンストリームタスクの知識がなくても、微調整されたモデルをバックドア攻撃によって簡単に制御できます。具体的には、攻撃者は単純な事前トレーニングタスクを追加して、トリガーインスタンスの出力隠し状態を、事前定義されたターゲット埋め込み、つまりニューロンレベルのバックドア攻撃（NeuBA）に制限できます。攻撃者がトリガーとそれに対応する出力の非表示状態を注意深く設計した場合、微調整中にバックドア機能を排除することはできません。自然言語処理（NLP）タスクとコンピュータービジョン（CV）タスクの両方の実験で、NeuBAがトリガーインスタンスの予測を完全に制御する一方で、クリーンなデータのモデルパフォーマンスに影響を与えないことを示します。最後に、再初期化ではNeuBAに抵抗できないことがわかり、普遍的な脆弱性を軽減するためのいくつかの可能な方向性について説明します。私たちの調査結果は、PTMの幅広い使用に対して赤い警告を発します。ソースコードとデータには、https：//github.com/thunlp/NeuBAからアクセスできます。

Due to the success of pre-trained models (PTMs), people usually fine-tune an existing PTM for downstream tasks. Most of PTMs are contributed and maintained by open sources and may suffer from backdoor attacks. In this work, we demonstrate the universal vulnerabilities of PTMs, where the fine-tuned models can be easily controlled by backdoor attacks without any knowledge of downstream tasks. Specifically, the attacker can add a simple pre-training task to restrict the output hidden states of the trigger instances to the pre-defined target embeddings, namely neuron-level backdoor attack (NeuBA). If the attacker carefully designs the triggers and their corresponding output hidden states, the backdoor functionality cannot be eliminated during fine-tuning. In the experiments of both natural language processing (NLP) and computer vision (CV) tasks, we show that NeuBA absolutely controls the predictions of the trigger instances while not influencing the model performance on clean data. Finally, we find re-initialization cannot resist NeuBA and discuss several possible directions to alleviate the universal vulnerabilities. Our findings sound a red alarm for the wide use of PTMs. Our source code and data can be accessed at https://github.com/thunlp/NeuBA.

updated: Tue Jan 19 2021 05:23:52 GMT+0000 (UTC)

published: Mon Jan 18 2021 10:18:42 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト