Searching for the Essence of Adversarial Perturbations

Dennis Y. Menn; Tzu-hsun Feng; Hung-yi Lee

敵対的摂動の本質を探る

ニューラルネットワークは、さまざまな機械学習分野で最先端のパフォーマンスを発揮しています。ただし、敵対的な例として知られる入力データへの悪意のある摂動の導入は、ニューラルネットワークの予測を欺くことが示されています。これは、自動運転やテキスト識別などの実際のアプリケーションに潜在的なリスクをもたらします。これらのリスクを軽減するには、敵対的な例の根底にあるメカニズムを包括的に理解することが不可欠です。この研究では、敵対的摂動には人間が認識できる情報が含まれていることを示しています。これは、人間を識別できない特性がネットワークをだますのに重要な役割を果たすという広く信じられている信念とは対照的に、ニューラルネットワークの誤った予測の原因となる主要な共謀者です。人間が認識できる特性のこの概念により、敵対的摂動の存在、異なるニューラルネットワーク間の転送可能性、敵対的トレーニングの解釈可能性の向上など、敵対的摂動の重要な特徴を説明することができます。また、ニューラルネットワークを欺く敵対的摂動の 2 つの固有の特性、マスキングと生成も明らかにします。さらに、ニューラルネットワークが入力画像を分類するときに、特別なクラスである相補クラスが識別されます。敵対的摂動における人間が認識できる情報の存在により、研究者はニューラルネットワークの動作原理についての洞察を得ることができ、敵対的攻撃を検出して防御するための技術の開発につながる可能性があります。

Neural networks have demonstrated state-of-the-art performance in various machine learning fields. However, the introduction of malicious perturbations in input data, known as adversarial examples, has been shown to deceive neural network predictions. This poses potential risks for real-world applications such as autonomous driving and text identification. In order to mitigate these risks, a comprehensive understanding of the mechanisms underlying adversarial examples is essential. In this study, we demonstrate that adversarial perturbations contain human-recognizable information, which is the key conspirator responsible for a neural network's incorrect prediction, in contrast to the widely held belief that human-unidentifiable characteristics play a critical role in fooling a network. This concept of human-recognizable characteristics enables us to explain key features of adversarial perturbations, including their existence, transferability among different neural networks, and increased interpretability for adversarial training. We also uncover two unique properties of adversarial perturbations that deceive neural networks: masking and generation. Additionally, a special class, the complementary class, is identified when neural networks classify input images. The presence of human-recognizable information in adversarial perturbations allows researchers to gain insight into the working principles of neural networks and may lead to the development of techniques for detecting and defending against adversarial attacks.

updated: Fri Feb 03 2023 10:38:51 GMT+0000 (UTC)

published: Mon May 30 2022 18:04:57 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト