Improving Adversarial Robustness of Masked Autoencoders via Test-time Frequency-domain Prompting

Qidong Huang; Xiaoyi Dong; Dongdong Chen; Yinpeng Chen; Lu Yuan; Gang Hua; Weiming Zhang; Nenghai Yu

テスト時の周波数ドメインプロンプトによるマスクされたオートエンコーダーの敵対的堅牢性の向上

この論文では、BERT 事前トレーニング (BEiT、MAE など) を備えたビジョントランスフォーマーの敵対的ロバスト性を調査します。驚くべき観察は、MAE の敵対的堅牢性が他の BERT 事前トレーニング方法よりも著しく悪いということです。この観察により、これらの BERT 事前トレーニング手法間の基本的な違いと、これらの違いが敵対的な摂動に対する堅牢性にどのように影響するかを再考するようになりました。私たちの経験的分析により、BERT 事前トレーニングの敵対的堅牢性は再構成ターゲットと密接に関連していることが明らかになりました。つまり、マスクされた画像パッチの生のピクセルを予測すると、モデルが集中するように誘導されるため、意味論的コンテキストを予測するよりもモデルの敵対的堅牢性が低下します。画像の中/高周波成分について詳しく説明します。私たちの分析に基づいて、MAE の敵対的堅牢性を高めるためのシンプルかつ効果的な方法を提供します。基本的なアイデアは、データセットから抽出されたドメイン知識を使用して画像の中/高周波を占有し、それによって敵対的な摂動の最適化スペースを狭めることです。具体的には、事前トレーニングデータの分布をグループ化し、周波数領域でクラスター固有の一連の視覚的プロンプトを最適化します。これらのプロンプトは、テスト期間中にプロトタイプベースのプロンプト選択を通じて入力画像に組み込まれます。広範な評価により、私たちの方法が ImageNet-1k 分類におけるクリーンなパフォーマンスを維持しながら、MAE の敵対的堅牢性を明らかに向上させることが示されました。私たちのコードは、https://github.com/sekiw/RobustMAEhttps://github.com/sekiw/RobustMAE で入手できます。

In this paper, we investigate the adversarial robustness of vision transformers that are equipped with BERT pretraining (e.g. , BEiT, MAE). A surprising observation is that MAE has significantly worse adversarial robustness than other BERT pretraining methods. This observation drives us to rethink the basic differences between these BERT pretraining methods and how these differences affect the robustness against adversarial perturbations. Our empirical analysis reveals that the adversarial robustness of BERT pretraining is highly related to the reconstruction target, i.e. , predicting the raw pixels of masked image patches will degrade more adversarial robustness of the model than predicting the semantic context, since it guides the model to concentrate more on medium-/high-frequency components of images. Based on our analysis, we provide a simple yet effective way to boost the adversarial robustness of MAE. The basic idea is using the dataset-extracted domain knowledge to occupy the medium-/high-frequency of images, thus narrowing the optimization space of adversarial perturbations. Specifically, we group the distribution of pretraining data and optimize a set of cluster-specific visual prompts on frequency domain. These prompts are incorporated with input images through prototype-based prompt selection during test period. Extensive evaluation shows that our method clearly boost MAE's adversarial robustness while maintaining its clean performance on ImageNet-1k classification. Our code is available at: https://github.com/shikiw/RobustMAEhttps://github.com/shikiw/RobustMAE.

updated: Sun Aug 20 2023 16:27:17 GMT+0000 (UTC)

published: Sun Aug 20 2023 16:27:17 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト