DAD: Data-free Adversarial Defense at Test Time

Gaurav Kumar Nayak; Ruchit Rawal; Anirban Chakraborty

DAD：テスト時のデータフリーの敵対的防御

ディープモデルは、敵対的な攻撃の影響を非常に受けやすくなっています。このような攻撃は慎重に作成された知覚できないノイズであり、ネットワークをだまし、展開すると深刻な結果を引き起こす可能性があります。それらに遭遇するために、モデルは敵対的な訓練または明示的な正則化ベースの技術のための訓練データを必要とします。ただし、プライバシーは重要な懸念事項になり、トレーニングされたモデルのみにアクセスを制限し、トレーニングデータ（生体認証データなど）にはアクセスを制限していません。また、データキュレーションには費用がかかり、企業がそれに対する所有権を持っている場合があります。このような状況に対処するために、「トレーニングデータや統計さえも存在しない場合のテスト時の敵対的防御」というまったく新しい問題を提案します。これは、a）検出とb）敵対的なサンプルの修正の2つの段階で解決します。私たちの敵対的なサンプル検出フレームワークは、最初は任意のデータでトレーニングされ、その後、教師なしドメイン適応を通じてラベルなしのテストデータに適応されます。さらに、検出された敵対サンプルの予測をフーリエドメインに変換し、モデル予測に提案された適切な半径でそれらの低周波数成分を取得することにより、予測を修正します。いくつかの敵対的な攻撃に対する広範な実験を通じて、さまざまなモデルアーキテクチャとデータセットに対して、提案された手法の有効性を示します。 CIFAR-10で事前トレーニングされた堅牢でないResnet-18モデルの場合、検出方法は91.42％の攻撃者を正しく識別します。また、モデルを再トレーニングすることなく、最先端の「自動攻撃」でのクリーン精度の低下を最小限に抑えて、敵対的な精度を0％から37.37％に大幅に向上させます。

Deep models are highly susceptible to adversarial attacks. Such attacks are carefully crafted imperceptible noises that can fool the network and can cause severe consequences when deployed. To encounter them, the model requires training data for adversarial training or explicit regularization-based techniques. However, privacy has become an important concern, restricting access to only trained models but not the training data (e.g. biometric data). Also, data curation is expensive and companies may have proprietary rights over it. To handle such situations, we propose a completely novel problem of 'test-time adversarial defense in absence of training data and even their statistics'. We solve it in two stages: a) detection and b) correction of adversarial samples. Our adversarial sample detection framework is initially trained on arbitrary data and is subsequently adapted to the unlabelled test data through unsupervised domain adaptation. We further correct the predictions on detected adversarial samples by transforming them in Fourier domain and obtaining their low frequency component at our proposed suitable radius for model prediction. We demonstrate the efficacy of our proposed technique via extensive experiments against several adversarial attacks and for different model architectures and datasets. For a non-robust Resnet-18 model pre-trained on CIFAR-10, our detection method correctly identifies 91.42% adversaries. Also, we significantly improve the adversarial accuracy from 0% to 37.37% with a minimal drop of 0.02% in clean accuracy on state-of-the-art 'Auto Attack' without having to retrain the model.

updated: Fri Apr 08 2022 16:03:30 GMT+0000 (UTC)

published: Mon Apr 04 2022 15:16:13 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト