Classifier-independent Lower-Bounds for Adversarial Robustness

Elvis Dohmatob

敵対的ロバスト性のための分類器に依存しない下限

分類において、テスト時の敵対的でノイズの多い例に対するロバスト性の限界を理論的に分析します。私たちの仕事は、与えられた問題のすべての分類器（つまり、特徴からラベルまでのすべての可測関数）に均一に適用される境界を導出することに焦点を当てています。私たちの貢献は2つあります。（1）最適な輸送理論を使用して、ベイズの変分式を導き出します。これは、敵対的な攻撃を受けた特定の分類問題に対して分類器が作成できる最適なエラーです。最適な敵対的攻撃は、特定の攻撃モデルによって誘発される特定のバイナリコスト関数の最適な輸送計画であり、2部グラフの最大マッチングに基づく単純なアルゴリズムを介して計算できます。（2）一般的な距離ベースの攻撃の場合、ベイズ最適誤差の明示的な下限を導き出します。これらの境界は、データのクラス条件付き分布のジオメトリに依存するという意味で普遍的ですが、特定の分類子には依存しません。私たちの結果は、分類器の敵対的な脆弱性がゼロ以外の通常のテストエラーの結果として導き出される既存の文献とはまったく対照的です。

We theoretically analyse the limits of robustness to test-time adversarial and noisy examples in classification. Our work focuses on deriving bounds which uniformly apply to all classifiers (i.e all measurable functions from features to labels) for a given problem. Our contributions are two-fold. (1) We use optimal transport theory to derive variational formulae for the Bayes-optimal error a classifier can make on a given classification problem, subject to adversarial attacks. The optimal adversarial attack is then an optimal transport plan for a certain binary cost-function induced by the specific attack model, and can be computed via a simple algorithm based on maximal matching on bipartite graphs. (2) We derive explicit lower-bounds on the Bayes-optimal error in the case of the popular distance-based attacks. These bounds are universal in the sense that they depend on the geometry of the class-conditional distributions of the data, but not on a particular classifier. Our results are in sharp contrast with the existing literature, wherein adversarial vulnerability of classifiers is derived as a consequence of nonzero ordinary test error.

updated: Tue Nov 10 2020 00:32:30 GMT+0000 (UTC)

published: Wed Jun 17 2020 16:46:39 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト