Generating Unrestricted Adversarial Examples via Three Parameters

Hanieh Naderi; Leili Goli; Shohreh Kasaei

3つのパラメータを介した無制限の敵対的な例の生成

ディープニューラルネットワークは、犠牲者モデルを誤分類するために意図的に構築された敵対的な例に対して脆弱であることが示されています。ほとんどの敵対的な例は摂動をL_p-normに制限しているため、既存の防御方法はこれらのタイプの摂動に焦点を合わせており、制限されていない敵対的な例にはあまり注意が払われていません。より現実的な攻撃を作成でき、人間の予測に影響を与えることなくモデルを欺くことができます。この問題に対処するために、提案された敵対的攻撃は、パラメータの数が制限された無制限の敵対的例を生成します。攻撃は入力画像上の3つのポイントを選択し、それらの位置に基づいて画像を敵対的な例に変換します。これらの3点の移動範囲と位置を制限し、識別ネットワークを使用することにより、提案された無制限の敵対的な例は、画像の外観を維持します。実験結果は、提案された敵対的な例が、MNISTおよびSVHNデータセットでの人間の評価に関して93.5％の平均成功率を獲得することを示しています。また、6つのデータセットMNIST、FMNIST、SVHN、CIFAR10、CIFAR100、およびImageNetでモデルの精度が平均73％低下します。攻撃の場合、被害者モデルの精度が低いほど、攻撃が成功していることを示していることに注意してください。攻撃の敵対的な列は、ランダムに変換された画像に対するモデルの堅牢性も向上させます。

Deep neural networks have been shown to be vulnerable to adversarial examples deliberately constructed to misclassify victim models. As most adversarial examples have restricted their perturbations to L_p-norm, existing defense methods have focused on these types of perturbations and less attention has been paid to unrestricted adversarial examples; which can create more realistic attacks, able to deceive models without affecting human predictions. To address this problem, the proposed adversarial attack generates an unrestricted adversarial example with a limited number of parameters. The attack selects three points on the input image and based on their locations transforms the image into an adversarial example. By limiting the range of movement and location of these three points and using a discriminatory network, the proposed unrestricted adversarial example preserves the image appearance. Experimental results show that the proposed adversarial examples obtain an average success rate of 93.5% in terms of human evaluation on the MNIST and SVHN datasets. It also reduces the model accuracy by an average of 73% on six datasets MNIST, FMNIST, SVHN, CIFAR10, CIFAR100, and ImageNet. It should be noted that, in the case of attacks, lower accuracy in the victim model denotes a more successful attack. The adversarial train of the attack also improves model robustness against a randomly transformed image.

updated: Sat Mar 13 2021 07:20:14 GMT+0000 (UTC)

published: Sat Mar 13 2021 07:20:14 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト