Benign Adversarial Attack: Tricking Models for Goodness

Jitao Sang; Xian Zhao; Jiaming Zhang; Zhiyu Lin

良性の敵対的攻撃：善のためのトリックモデル

多くの分野でアプリケーションが成功しているにもかかわらず、今日の機械学習モデルは、敵対的な例に対する脆弱性などの悪名高い問題に悩まされています。このホワイトペーパーでは、敵対的な攻撃と防御の間のいたちごっこゲームに陥るだけでなく、敵対的な例を検討し、それを良性のアプリケーションで悪用できるかどうかを検討するための代替的な視点を提供します。最初に、敵対的な例を、非セマンティック機能の採用に関する人間モデルの不一致に起因すると考えます。従来の機械学習メカニズムではほとんど無視されていましたが、非セマンティック機能には、（1）モデル専用、（2）推論に影響を与えるために重要、（3）機能として利用できるという3つの興味深い特性があります。これに触発されて、3つの方向で敵対的な例を悪用するための良性の敵対的攻撃の勇敢な新しいアイデアを提示します：（1）敵対的なチューリングテスト、（2）悪意のあるモデルアプリケーションの拒否、および（3）敵対的なデータ拡張。それぞれの方向性は、その可能性を示すために、動機付けの精緻化、正当化分析、およびプロトタイプアプリケーションで配置されます。

In spite of the successful application in many fields, machine learning models today suffer from notorious problems like vulnerability to adversarial examples. Beyond falling into the cat-and-mouse game between adversarial attack and defense, this paper provides alternative perspective to consider adversarial example and explore whether we can exploit it in benign applications. We first attribute adversarial example to the human-model disparity on employing non-semantic features. While largely ignored in classical machine learning mechanisms, non-semantic feature enjoys three interesting characteristics as (1) exclusive to model, (2) critical to affect inference, and (3) utilizable as features. Inspired by this, we present brave new idea of benign adversarial attack to exploit adversarial examples for goodness in three directions: (1) adversarial Turing test, (2) rejecting malicious model application, and (3) adversarial data augmentation. Each direction is positioned with motivation elaboration, justification analysis and prototype applications to showcase its potential.

updated: Tue Jul 05 2022 14:25:20 GMT+0000 (UTC)

published: Mon Jul 26 2021 06:46:19 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト