Automated Discovery of Adaptive Attacks on Adversarial Defenses

Chengyuan Yao; Pavol Bielik; Petar Tsankov; Martin Vechev

敵対的防御に対する適応的攻撃の自動発見

敵対的防御の信頼性の高い評価は困難な作業であり、現在、固定攻撃のアンサンブルに基づいて防御の内部動作またはアプローチを悪用する攻撃を手動で作成する専門家に限定されていますが、いずれも目前の特定の防御には効果的ではない可能性があります。私たちの重要な観察は、適応的攻撃は、検索スペースで形式化され、未知の防御に対する攻撃を自動的に発見するために使用できる再利用可能なビルディングブロックで構成されているということです。私たちは24の敵対的防御についてアプローチを評価し、敵対的防御の信頼できる評価のための現在の最先端ツールであるAutoAttackよりも優れていることを示しました。残りのモデルに対してわずかに強いまたは同様の強さの攻撃を取得しながら、モデル。

Reliable evaluation of adversarial defenses is a challenging task, currently limited to an expert who manually crafts attacks that exploit the defense's inner workings or approaches based on an ensemble of fixed attacks, none of which may be effective for the specific defense at hand. Our key observation is that adaptive attacks are composed of reusable building blocks that can be formalized in a search space and used to automatically discover attacks for unknown defenses. We evaluated our approach on 24 adversarial defenses and show that it outperforms AutoAttack, the current state-of-the-art tool for reliable evaluation of adversarial defenses: our tool discovered significantly stronger attacks by producing 3.0%-50.8% additional adversarial examples for 10 models, while obtaining attacks with slightly stronger or similar strength for the remaining models.

updated: Wed Oct 27 2021 08:26:18 GMT+0000 (UTC)

published: Tue Feb 23 2021 18:43:24 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト