Good Artists Copy, Great Artists Steal: Model Extraction Attacks Against Image Translation Models

Sebastian Szyller; Vasisht Duddu; Tommi Gröndahl; N. Asokan

優れたアーティストはコピーし、優れたアーティストは盗む: 画像変換モデルに対するモデル抽出攻撃

機械学習モデルは通常、推論 API を介して潜在的なクライアントユーザーに提供されます。モデル抽出攻撃は、悪意のあるクライアントがクエリから収集した情報を被害者モデル F_V の推論 API に使用して、同等の機能を持つ代理モデル F_A を構築するときに発生します。最近の研究では、画像分類のモデル抽出と自然言語処理モデルの成功が示されています。この論文では、実世界の敵対的生成ネットワーク (GAN) 画像変換モデルに対する最初のモデル抽出攻撃を示します。このような攻撃を実行するためのフレームワークを提示し、攻撃者が F_V のトレーニングデータと同じドメインからのデータを使用して F_V をクエリすることにより、機能的なサロゲートモデルを正常に抽出できることを示します。敵対者は、F_V のアーキテクチャや、意図したタスク以外の F_V に関するその他の情報を知る必要はありません。画像変換の 2 つの一般的なカテゴリの 3 つの異なるインスタンスを使用して、攻撃の有効性を評価します。(1) セルフィーからアニメ、(2) モネから写真 (画像スタイルの転送)、および (3) 超解像度 (超解像度）。 GAN の標準的なパフォーマンスメトリクスを使用して、攻撃が効果的であることを示します。さらに、Selfie-to-Anime と Monet-to-Photo に関する大規模 (125 人の参加者) のユーザー調査を実施し、F_V と F_A によって生成された画像に対する人間の認識は、Cohen の d = 0.3。最後に、モデル抽出攻撃 (透かし、敵対的な例、ポイズニング) に対する既存の防御は、画像変換モデルには拡張されないことを示します。

Machine learning models are typically made available to potential client users via inference APIs. Model extraction attacks occur when a malicious client uses information gleaned from queries to the inference API of a victim model F_V to build a surrogate model F_A with comparable functionality. Recent research has shown successful model extraction of image classification, and natural language processing models. In this paper, we show the first model extraction attack against real-world generative adversarial network (GAN) image translation models. We present a framework for conducting such attacks, and show that an adversary can successfully extract functional surrogate models by querying F_V using data from the same domain as the training data for F_V. The adversary need not know F_V's architecture or any other information about it beyond its intended task. We evaluate the effectiveness of our attacks using three different instances of two popular categories of image translation: (1) Selfie-to-Anime and (2) Monet-to-Photo (image style transfer), and (3) Super-Resolution (super resolution). Using standard performance metrics for GANs, we show that our attacks are effective. Furthermore, we conducted a large scale (125 participants) user study on Selfie-to-Anime and Monet-to-Photo to show that human perception of the images produced by F_V and F_A can be considered equivalent, within an equivalence bound of Cohen's d = 0.3. Finally, we show that existing defenses against model extraction attacks (watermarking, adversarial examples, poisoning) do not extend to image translation models.

updated: Tue Feb 28 2023 09:37:59 GMT+0000 (UTC)

published: Mon Apr 26 2021 14:50:59 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト