Data-free Defense of Black Box Models Against Adversarial Attacks

Gaurav Kumar Nayak; Inder Khatri; Ruchit Rawal; Anirban Chakraborty

敵対的な攻撃に対するブラックボックスモデルのデータフリー防御

いくつかの企業は、トレーニング済みのディープモデル (つまり、アーキテクチャの詳細、学習した重み、トレーニングの詳細など) を、API を介してブラックボックスとしてのみ公開することで、サードパーティユーザーから保護することがよくあります。さらに、独自の理由や機密性の問題により、トレーニングデータへのアクセスさえ提供しない場合もあります。この研究では、データフリーの設定における敵対的な攻撃に対するブラックボックスモデルの新しい防御メカニズムを提案します。生成モデルを介して合成データを構築し、モデル盗用技術を使用してサロゲートネットワークをトレーニングします。摂動されたサンプルに対する敵対的な汚染を最小限に抑えるために、入力画像に対して離散ウェーブレット分解を実行し、「ウェーブレット係数選択モジュール」(WCSM) によって決定される少数の重要な係数のみを慎重に選択する「ウェーブレットノイズリムーバー」(WNR) を提案します。 WNR によるノイズ除去後に画像の高周波成分を復元するために、再構成された画像がサロゲートモデルの元の予測と同様の結果をもたらすような係数を取得することを目的として、「再生成」ネットワークをさらにトレーニングします。テスト時には、トレーニング済みの再生ネットワークと組み合わせた WNR がブラックボックスネットワークの先頭に追加され、敵対的精度が大幅に向上します。私たちの手法では、攻撃者がブラックボックスアーキテクチャと同様のサロゲートアーキテクチャ (Alexnet-half および Alexnet) を使用している場合でも、ベースラインと比較して、CIFAR-10 での敵対的精度が 38.98%、最先端の自動攻撃で 32.01% 向上しました。 (Alexnet) ディフェンダーと同じモデルのスティール戦略を採用。コードは https://github.com/vcl-iisc/data-free-black-box-defense で入手できます。

Several companies often safeguard their trained deep models (i.e., details of architecture, learnt weights, training details etc.) from third-party users by exposing them only as black boxes through APIs. Moreover, they may not even provide access to the training data due to proprietary reasons or sensitivity concerns. In this work, we propose a novel defense mechanism for black box models against adversarial attacks in a data-free set up. We construct synthetic data via generative model and train surrogate network using model stealing techniques. To minimize adversarial contamination on perturbed samples, we propose 'wavelet noise remover' (WNR) that performs discrete wavelet decomposition on input images and carefully select only a few important coefficients determined by our 'wavelet coefficient selection module' (WCSM). To recover the high-frequency content of the image after noise removal via WNR, we further train a 'regenerator' network with the objective of retrieving the coefficients such that the reconstructed image yields similar to original predictions on the surrogate model. At test time, WNR combined with trained regenerator network is prepended to the black box network, resulting in a high boost in adversarial accuracy. Our method improves the adversarial accuracy on CIFAR-10 by 38.98% and 32.01% on state-of-the-art Auto Attack compared to baseline, even when the attacker uses surrogate architecture (Alexnet-half and Alexnet) similar to the black box architecture (Alexnet) with same model stealing strategy as defender. The code is available at https://github.com/vcl-iisc/data-free-black-box-defense

updated: Wed Jun 28 2023 21:38:48 GMT+0000 (UTC)

published: Thu Nov 03 2022 04:19:27 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト