Blind Backdoors in Deep Learning Models

Eugene Bagdasaryan; Vitaly Shmatikov

ディープラーニングモデルのブラインドバックドア

モデルトレーニングコードの損失値の計算を妥協することに基づいて、機械学習モデルにバックドアを注入するための新しい方法を調査します。これを使用して、以前の文献よりも厳密に強力な新しいクラスのバックドアを示します。ImageNetモデルの単一ピクセルおよび物理的なバックドア、モデルを秘密のプライバシー侵害タスクに切り替えるバックドア、推論を必要としないバックドア-時間入力の変更。私たちの攻撃は盲目的です。攻撃者はトレーニングデータを変更したり、コードの実行を観察したり、結果のモデルにアクセスしたりすることはできません。攻撃コードは、モデルがトレーニングしているときに「オンザフライ」でポイズニングされたトレーニング入力を作成し、多目的最適化を使用してメインタスクとバックドアタスクの両方で高精度を実現します。ブラインド攻撃が既知の防御を回避し、新しい防御を提案する方法を示します。

We investigate a new method for injecting backdoors into machine learning models, based on compromising the loss-value computation in the model-training code. We use it to demonstrate new classes of backdoors strictly more powerful than those in the prior literature: single-pixel and physical backdoors in ImageNet models, backdoors that switch the model to a covert, privacy-violating task, and backdoors that do not require inference-time input modifications. Our attack is blind: the attacker cannot modify the training data, nor observe the execution of his code, nor access the resulting model. The attack code creates poisoned training inputs "on the fly," as the model is training, and uses multi-objective optimization to achieve high accuracy on both the main and backdoor tasks. We show how a blind attack can evade any known defense and propose new ones.

updated: Fri Feb 19 2021 04:45:28 GMT+0000 (UTC)

published: Fri May 08 2020 02:15:53 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト