DP-InstaHide: Provably Defusing Poisoning and Backdoor Attacks with Differentially Private Data Augmentations

Eitan Borgnia; Jonas Geiping; Valeriia Cherepanova; Liam Fowl; Arjun Gupta; Amin Ghiasi; Furong Huang; Micah Goldblum; Tom Goldstein

DP-InstaHide：差分プライベートデータの拡張により、ポイズニングとバックドア攻撃を確実に阻止します

データポイズニングとバックドア攻撃は、トレーニングデータを操作して、被害者モデルにセキュリティ違反を引き起こします。これらの攻撃は、差分プライベート（DP）トレーニング方法を使用して確実に回避できますが、これにはモデルのパフォーマンスが急激に低下します。 InstaHideメソッドは、厳密な保証はありませんが、ミックスアップ拡張の想定されるプライバシープロパティを活用するDPトレーニングの代替として最近提案されました。この作業では、混合やランダムな加法性ノイズなどの強力なデータ拡張が、わずかな精度のトレードオフに耐えながら、毒攻撃を無効にすることを示します。これらの発見を説明するために、混合正則化と加法性ノイズを組み合わせたトレーニング方法DP-InstaHideを提案します。 DP-InstaHideの厳密な分析は、ミックスアップには確かにプライバシーの利点があり、k-wayミックスアップを使用したトレーニングでは、ナイーブなDPメカニズムよりも少なくともk倍強力なDP保証が得られることを示しています。（ノイズではなく）混合はモデルのパフォーマンスに有益であるため、DP-InstaHideは、他の既知のDPメソッドよりも強力な経験的パフォーマンスを中毒攻撃に対して達成するためのメカニズムを提供します。

Data poisoning and backdoor attacks manipulate training data to induce security breaches in a victim model. These attacks can be provably deflected using differentially private (DP) training methods, although this comes with a sharp decrease in model performance. The InstaHide method has recently been proposed as an alternative to DP training that leverages supposed privacy properties of the mixup augmentation, although without rigorous guarantees. In this work, we show that strong data augmentations, such as mixup and random additive noise, nullify poison attacks while enduring only a small accuracy trade-off. To explain these finding, we propose a training method, DP-InstaHide, which combines the mixup regularizer with additive noise. A rigorous analysis of DP-InstaHide shows that mixup does indeed have privacy advantages, and that training with k-way mixup provably yields at least k times stronger DP guarantees than a naive DP mechanism. Because mixup (as opposed to noise) is beneficial to model performance, DP-InstaHide provides a mechanism for achieving stronger empirical performance against poisoning attacks than other known DP methods.

updated: Tue Mar 02 2021 23:07:31 GMT+0000 (UTC)

published: Tue Mar 02 2021 23:07:31 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト