Benford's law: what does it say on adversarial images?

João G. Zago; Fabio L. Baldissera; Eric A. Antonelo; Rodrigo T. Saad

ベンフォードの法則：敵対的な画像については何と言っていますか？

畳み込みニューラルネットワーク（CNN）は、入力画像の小さな摂動に対して脆弱です。したがって、これらのネットワークは、入力を混乱させて誤分類を強制する悪意のある攻撃を受けやすくなります。分類器を欺くことを目的としたこのようなわずかに操作された画像は、敵対的画像として知られています。この作業では、自然画像と敵対画像の統計的差異を調査します。より正確には、適切な画像変換を採用し、敵対的な攻撃のクラスに対して、敵対的な画像のピクセルの先頭の桁の分布がベンフォードの法則から逸脱していることを示します。攻撃が強いほど、結果の分布はベンフォードの法則から遠くなります。私たちの分析は、元のCNN分類子を変更する必要がなく、攻撃から防御する機能として生の高次元ピクセルで機能しない代替の敵対的な例の検出方法の基礎として役立つことができるこの新しいアプローチの詳細な調査を提供します。

Convolutional neural networks (CNNs) are fragile to small perturbations in the input images. These networks are thus prone to malicious attacks that perturb the inputs to force a misclassification. Such slightly manipulated images aimed at deceiving the classifier are known as adversarial images. In this work, we investigate statistical differences between natural images and adversarial ones. More precisely, we show that employing a proper image transformation and for a class of adversarial attacks, the distribution of the leading digit of the pixels in adversarial images deviates from Benford's law. The stronger the attack, the more distant the resulting distribution is from Benford's law. Our analysis provides a detailed investigation of this new approach that can serve as a basis for alternative adversarial example detection methods that do not need to modify the original CNN classifier neither work on the raw high-dimensional pixels as features to defend against attacks.

updated: Mon Mar 06 2023 00:29:14 GMT+0000 (UTC)

published: Tue Feb 09 2021 02:50:29 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト