A Practical Upper Bound for the Worst-Case Attribution Deviations

Fan Wang; Adams Wai-Kin Kong

最悪の場合の帰属偏差の実用的な上限

モデル属性は、複雑なモデルを解釈できるようにするために、ディープニューラルネットワーク (DNN) の重要なコンポーネントです。最近の研究では、アトリビューション手法のセキュリティが注目されています。アトリビューション攻撃は、劇的に異なるアトリビューションを持つ類似の画像を生成するアトリビューション攻撃に対して脆弱であるためです。既存の研究では、これらの攻撃に対する DNN の堅牢性を改善するために経験的に調査されています。ただし、それらのいずれも、帰属の実際の偏差を明示的に定量化していません。この作業では、初めて、制約付き最適化問題を定式化して、サンプルが特定の領域内のノイズによって摂動された後、分類結果が同じままである属性の最大の非類似度を測定する上限を導き出します。定式化に基づいて,異なる実用的なアプローチを導入して,ℓ_2とℓ_∞ノルム摂動制約の両方の下でユークリッド距離とコサイン類似度を使用して上記の属性を制限した。理論的研究によって開発された境界は、さまざまなデータセットと 2 つの異なるタイプの攻撃 (PGD 攻撃と IFIA 属性攻撃) で検証されています。実験での 1,000 万回を超える攻撃は、提案された上限が最悪の場合の属性の非類似性に基づいてモデルのロバスト性を効果的に定量化することを示しています。

Model attribution is a critical component of deep neural networks (DNNs) for its interpretability to complex models. Recent studies bring up attention to the security of attribution methods as they are vulnerable to attribution attacks that generate similar images with dramatically different attributions. Existing works have been investigating empirically improving the robustness of DNNs against those attacks; however, none of them explicitly quantifies the actual deviations of attributions. In this work, for the first time, a constrained optimization problem is formulated to derive an upper bound that measures the largest dissimilarity of attributions after the samples are perturbed by any noises within a certain region while the classification results remain the same. Based on the formulation, different practical approaches are introduced to bound the attributions above using Euclidean distance and cosine similarity under both ℓ_2 and ℓ_∞-norm perturbations constraints. The bounds developed by our theoretical study are validated on various datasets and two different types of attacks (PGD attack and IFIA attribution attack). Over 10 million attacks in the experiments indicate that the proposed upper bounds effectively quantify the robustness of models based on the worst-case attribution dissimilarities.

updated: Wed Mar 01 2023 09:07:27 GMT+0000 (UTC)

published: Wed Mar 01 2023 09:07:27 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト