Transferable Universal Adversarial Perturbations Using Generative Models

Atiye Sadat Hashemi; Andreas Bär; Saeed Mozaffari; Tim Fingscheidt

生成モデルを使用した転送可能な普遍的な敵対的摂動

ディープニューラルネットワークは、敵対的な摂動に対して脆弱である傾向があり、自然な画像に追加することで、それぞれのモデルを高い信頼性でだますことができます。最近、ユニバーサル敵対摂動（UAP）としても知られる画像にとらわれない摂動の存在が発見されました。ただし、既存のUAPは、未知のターゲットモデルに適用された場合、まだ十分に高いだまし率を欠いています。この論文では、より転送可能なUAPを生成するための新しい深層学習手法を提案します。 ImageNetデータセットを使用してUAPを生成するために、摂動ジェネレーターといくつかの事前トレーニング済みネットワーク、いわゆるソースモデルを利用します。第1層のさまざまなモデルアーキテクチャの同様の特徴表現により、ソースモデルのそれぞれの第1層でのみ敵対エネルギーに焦点を当てた損失定式化を提案します。これにより、生成されたUAPを他のターゲットモデルに転送できるようになります。さらに、生成されたUAPを経験的に分析し、これらの摂動がさまざまなターゲットモデルに対して非常によく一般化されることを示します。だまし率とモデル転送可能性の両方で現在の最先端技術を超えて、提案されたアプローチの優位性を示すことができます。生成された非ターゲットUAPを使用すると、ソースモデルで93.36％の平均だまし率が得られます（最新技術：82.16％）。深いResNet-152でUAPを生成すると、VGG-16およびVGG-19ターゲットモデルの最先端の方法と比較して、約12％の絶対的なだまし率の利点が得られます。

Deep neural networks tend to be vulnerable to adversarial perturbations, which by adding to a natural image can fool a respective model with high confidence. Recently, the existence of image-agnostic perturbations, also known as universal adversarial perturbations (UAPs), were discovered. However, existing UAPs still lack a sufficiently high fooling rate, when being applied to an unknown target model. In this paper, we propose a novel deep learning technique for generating more transferable UAPs. We utilize a perturbation generator and some given pretrained networks so-called source models to generate UAPs using the ImageNet dataset. Due to the similar feature representation of various model architectures in the first layer, we propose a loss formulation that focuses on the adversarial energy only in the respective first layer of the source models. This supports the transferability of our generated UAPs to any other target model. We further empirically analyze our generated UAPs and demonstrate that these perturbations generalize very well towards different target models. Surpassing the current state of the art in both, fooling rate and model-transferability, we can show the superiority of our proposed approach. Using our generated non-targeted UAPs, we obtain an average fooling rate of 93.36% on the source models (state of the art: 82.16%). Generating our UAPs on the deep ResNet-152, we obtain about a 12% absolute fooling rate advantage vs. cutting-edge methods on VGG-16 and VGG-19 target models.

updated: Thu Oct 29 2020 15:19:41 GMT+0000 (UTC)

published: Wed Oct 28 2020 12:31:59 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト