SPADE: A Spectral Method for Black-Box Adversarial Robustness Evaluation

Wuxinlin Cheng; Chenhui Deng; Zhiqiang Zhao; Yaohui Cai; Zhiru Zhang; Zhuo Feng

SPADE：ブラックボックスの敵対的ロバスト性評価のためのスペクトル法

与えられた機械学習（ML）モデルの敵対的なロバスト性を評価するために、ブラックボックススペクトル法が導入されています。 SPADEという名前の私たちのアプローチは、入力/出力データに対応する多様体を近似するために構築された入力/出力グラフ間の全単射距離マッピングを利用します。一般化されたCourant-Fischerの定理を活用することにより、特定のモデルの敵対的ロバスト性を評価するためのSPADEスコアを提案します。これは、多様体設定での最良のリプシッツ定数の上限であることが証明されています。敵対的な攻撃に対して非常に脆弱な最も堅牢でないデータサンプルを明らかにするために、支配的な一般化された固有ベクトルを活用するスペクトルグラフ埋め込み手順を開発します。この埋め込みステップにより、各データサンプルに堅牢性スコアを割り当てることができます。このスコアは、より効果的な敵対的トレーニングにさらに活用できます。私たちの実験は、提案されたSPADE法が、MNISTおよびCIFAR-10データセットで敵対的に訓練されたニューラルネットワークモデルの有望な経験的結果につながることを示しています。

A black-box spectral method is introduced for evaluating the adversarial robustness of a given machine learning (ML) model. Our approach, named SPADE, exploits bijective distance mapping between the input/output graphs constructed for approximating the manifolds corresponding to the input/output data. By leveraging the generalized Courant-Fischer theorem, we propose a SPADE score for evaluating the adversarial robustness of a given model, which is proved to be an upper bound of the best Lipschitz constant under the manifold setting. To reveal the most non-robust data samples highly vulnerable to adversarial attacks, we develop a spectral graph embedding procedure leveraging dominant generalized eigenvectors. This embedding step allows assigning each data sample a robustness score that can be further harnessed for more effective adversarial training. Our experiments show the proposed SPADE method leads to promising empirical results for neural network models adversarially trained with the MNIST and CIFAR-10 data sets.

updated: Sun Feb 07 2021 04:41:26 GMT+0000 (UTC)

published: Sun Feb 07 2021 04:41:26 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト