A Unified Game-Theoretic Interpretation of Adversarial Robustness

Jie Ren; Die Zhang; Yisen Wang; Lu Chen; Zhanpeng Zhou; Yiting Chen; Xu Cheng; Xin Wang; Meng Zhou; Jie Shi; Quanshi Zhang

敵対的ロバストネスの統一されたゲーム理論的解釈

このホワイトペーパーでは、さまざまな敵対的攻撃と防御方法を説明するための統一されたビュー、つまりDNNの入力変数間の多次相互作用のビューを提供します。マルチオーダーの相互作用に基づいて、敵対的な攻撃が主に高次のインタラクションに影響を与え、DNNをだますことを発見しました。さらに、敵対的に訓練されたDNNの堅牢性は、カテゴリ固有の低次の相互作用に由来することがわかります。私たちの調査結果は、敵対的な摂動とロバスト性を統合するための潜在的な方法を提供します。これにより、既存の防御方法を原則的に説明できます。その上、私たちの調査結果はまた、敵対的に学習された特徴の形状バイアスの以前の不正確な理解の修正を行います。

This paper provides a unified view to explain different adversarial attacks and defense methods, i.e. the view of multi-order interactions between input variables of DNNs. Based on the multi-order interaction, we discover that adversarial attacks mainly affect high-order interactions to fool the DNN. Furthermore, we find that the robustness of adversarially trained DNNs comes from category-specific low-order interactions. Our findings provide a potential method to unify adversarial perturbations and robustness, which can explain the existing defense methods in a principle way. Besides, our findings also make a revision of previous inaccurate understanding of the shape bias of adversarially learned features.

updated: Tue Nov 09 2021 13:29:01 GMT+0000 (UTC)

published: Fri Mar 12 2021 15:56:28 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト