MORA: Improving Ensemble Robustness Evaluation with Model-Reweighing Attack

Yunrui Yu; Xitong Gao; Cheng-Zhong Xu

MORA: モデル再重み付け攻撃によるアンサンブルのロバスト性評価の改善

敵対的攻撃は、入力データに小さな摂動を加えることで、ニューラルネットワークを欺くことができます。サブモデル間の攻撃の転送可能性を最小限に抑えるように訓練されたアンサンブル防御は、自然な入力に対して高い精度を維持しながら、そのような攻撃に対する堅牢性を向上させる有望な研究の方向性を提供します。しかし、最近の最先端 (SOTA) の敵対的攻撃戦略では、アンサンブル防御を確実に評価できず、その堅牢性をかなり過大評価していることがわかりました。このホワイトペーパーでは、この動作に寄与する 2 つの要因を特定します。まず、これらの防御は、勾配の難読化により、既存の勾配ベースの方法では攻撃が著しく困難なアンサンブルを形成します。第二に、アンサンブル防御はサブモデルの勾配を多様化し、すべてのサブモデルを同時に打ち負かすという課題を提示します。それらの貢献を単純に合計すると、全体的な攻撃目標が打ち消される可能性があります。それでも、ほとんどのサブモデルが正しいにもかかわらず、アンサンブルがだまされる可能性があることがわかります。したがって、サブモデル勾配の重要性を再評価することにより、敵対的な例の合成を操縦するためのモデル再評価攻撃であるMORAを紹介します。 MORA は、最近のアンサンブル防御はすべて、さまざまな程度の過大評価されたロバスト性を示していることを発見しました。最近の SOTA ホワイトボックス攻撃と比較すると、3 つの異なるアンサンブルモード (つまり、ソフトマックス、投票、またはロジットのいずれかによるアンサンブル) で調べたすべてのアンサンブルモデルで、より高い攻撃成功率を達成しながら、桁違いに速く収束できます。特に、ほとんどのアンサンブル防御は、CIFAR-10 では 0.02 以内、CIFAR-100 では 0.01 以内の ℓ^∞ 摂動で、MORA に対してほぼ 0% または正確に 0% のロバスト性を示します。再現可能な結果と事前トレーニング済みのモデルを使用して、MORA をオープンソースにします。さまざまな攻撃戦略の下でアンサンブル防御のリーダーボードを提供します。

Adversarial attacks can deceive neural networks by adding tiny perturbations to their input data. Ensemble defenses, which are trained to minimize attack transferability among sub-models, offer a promising research direction to improve robustness against such attacks while maintaining a high accuracy on natural inputs. We discover, however, that recent state-of-the-art (SOTA) adversarial attack strategies cannot reliably evaluate ensemble defenses, sizeably overestimating their robustness. This paper identifies the two factors that contribute to this behavior. First, these defenses form ensembles that are notably difficult for existing gradient-based method to attack, due to gradient obfuscation. Second, ensemble defenses diversify sub-model gradients, presenting a challenge to defeat all sub-models simultaneously, simply summing their contributions may counteract the overall attack objective; yet, we observe that ensemble may still be fooled despite most sub-models being correct. We therefore introduce MORA, a model-reweighing attack to steer adversarial example synthesis by reweighing the importance of sub-model gradients. MORA finds that recent ensemble defenses all exhibit varying degrees of overestimated robustness. Comparing it against recent SOTA white-box attacks, it can converge orders of magnitude faster while achieving higher attack success rates across all ensemble models examined with three different ensemble modes (i.e., ensembling by either softmax, voting or logits). In particular, most ensemble defenses exhibit near or exactly 0% robustness against MORA with ℓ^∞ perturbation within 0.02 on CIFAR-10, and 0.01 on CIFAR-100. We make MORA open source with reproducible results and pre-trained models; and provide a leaderboard of ensemble defenses under various attack strategies.

updated: Tue Nov 15 2022 09:45:32 GMT+0000 (UTC)

published: Tue Nov 15 2022 09:45:32 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト