Test-time Detection and Repair of Adversarial Samples via Masked Autoencoder

Yun-Yun Tsai; Ju-Chin Chao; Albert Wen; Zhaoyuan Yang; Chengzhi Mao; Tapan Shah; Junfeng Yang

マスクされたオートエンコーダーによる敵対的サンプルのテスト時間検出と修復

敵対的トレーニングとして知られるトレーニング時間の防御は、高いトレーニングコストが発生し、目に見えない攻撃に一般化されません。テスト時の防御はこれらの問題を解決しますが、ほとんどの既存のテスト時の防御はモデルの重みを調整する必要があるため、凍結されたモデルでは機能せず、モデルのメモリ管理が複雑になります。モデルの重みを適応させない唯一のテスト時の防御は、自己監視タスクで入力を適応させることを目的としています。ただし、これらの自己監視タスクは、敵対的攻撃を正確に検出するほど感度が高くないことが経験的にわかっています。この論文では、Masked autoencoder (MAE) を介してテスト時に敵対的サンプルを検出および修復するための新しい防御方法である DRAM を提案します。 MAE 損失を使用して Kolmogorov-Smirnov テストを構築し、敵対的サンプルを検出する方法を示します。さらに、MAE 損失を使用して、これまでに見られなかった攻撃から生じる敵対的サンプルを修復する入力反転ベクトルを計算します。大規模な ImageNet データセットの結果は、評価されたすべての検出ベースラインと比較して、DRAM が評価された 8 つの敵対的攻撃すべてで最高の検出率 (平均で 82%) を達成することを示しています。攻撃修復の場合、DRAM は、対照学習と回転予測を使用するベースラインと比較して、標準の ResNet50 で 6% ~ 41%、堅牢な ResNet50 で 3% ~ 8% 堅牢な精度を向上させます。

Training-time defenses, known as adversarial training, incur high training costs and do not generalize to unseen attacks. Test-time defenses solve these issues but most existing test-time defenses require adapting the model weights, therefore they do not work on frozen models and complicate model memory management. The only test-time defense that does not adapt model weights aims to adapt the input with self-supervision tasks. However, we empirically found these self-supervision tasks are not sensitive enough to detect adversarial attacks accurately. In this paper, we propose DRAM, a novel defense method to detect and repair adversarial samples at test time via Masked autoencoder (MAE). We demonstrate how to use MAE losses to build a Kolmogorov-Smirnov test to detect adversarial samples. Moreover, we use the MAE losses to calculate input reversal vectors that repair adversarial samples resulting from previously unseen attacks. Results on large-scale ImageNet dataset show that, compared to all detection baselines evaluated, DRAM achieves the best detection rate (82% on average) on all eight adversarial attacks evaluated. For attack repair, DRAM improves the robust accuracy by 6% ~ 41% for standard ResNet50 and 3% ~ 8% for robust ResNet50 compared with the baselines that use contrastive learning and rotation prediction.

updated: Sun Apr 02 2023 21:27:16 GMT+0000 (UTC)

published: Wed Mar 22 2023 18:14:02 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト