Universal Adversarial Training

Ali Shafahi; Mahyar Najibi; Zheng Xu; John Dickerson; Larry S. Davis; Tom Goldstein

普遍的な敵対訓練

標準的な攻撃では、特別に調整された小さな摂動をそのピクセルに追加することにより、選択した画像の予測クラスラベルを変更します。対照的に、普遍的な摂動は、予測されたクラスラベルを変更しながら、幅広い画像クラスの任意の画像に追加できる更新です。普遍的な敵対的摂動の効率的な生成、およびこれらの攻撃に対してネットワークを強化するための効率的な方法を研究します。標準的な方法よりも13倍速くユニバーサル摂動を学習しながら、ImageNetのさまざまなネットワークアーキテクチャのトップ1の精度を20％未満に下げる単純な最適化ベースのユニバーサル攻撃を提案します。これらの摂動を防ぐために、ロバストな分類子生成の問題を2プレーヤーのミニマックスゲームとしてモデル化し、自然なトレーニングの2倍のコストでロバストなモデルを生成する普遍的な敵対トレーニングを提案します。また、追加の計算をほとんど必要としない同時確率的勾配法を提案します。これにより、ImageNetで普遍的な敵対訓練を行うことができます。

Standard adversarial attacks change the predicted class label of a selected image by adding specially tailored small perturbations to its pixels. In contrast, a universal perturbation is an update that can be added to any image in a broad class of images, while still changing the predicted class label. We study the efficient generation of universal adversarial perturbations, and also efficient methods for hardening networks to these attacks. We propose a simple optimization-based universal attack that reduces the top-1 accuracy of various network architectures on ImageNet to less than 20%, while learning the universal perturbation 13X faster than the standard method. To defend against these perturbations, we propose universal adversarial training, which models the problem of robust classifier generation as a two-player min-max game, and produces robust models with only 2X the cost of natural training. We also propose a simultaneous stochastic gradient method that is almost free of extra computation, which allows us to do universal adversarial training on ImageNet.

updated: Wed Nov 20 2019 20:57:36 GMT+0000 (UTC)

published: Tue Nov 27 2018 23:09:27 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト