Cracking White-box DNN Watermarks via Invariant Neuron Transforms

Yifan Yan; Xudong Pan; Yining Wang; Mi Zhang; Min Yang

不変ニューロン変換によるホワイトボックスDNN透かしのクラッキング

最近、ディープニューラルネットワーク（DNN）の知的財産（IP）をどのように保護するかがAI業界の主要な関心事になっています。潜在的なモデルの著作権侵害と戦うために、最近の研究では、ターゲットモデルの予測動作または内部（たとえば、重みやニューロンのアクティブ化）に秘密のIDメッセージを埋め込むためのさまざまな透かし戦略を検討しています。機能性を犠牲にし、ターゲットモデルに関する知識を増やすことで、透かし方式の後者のブランチ（つまり、ホワイトボックスモデルの透かし）は、ほとんどの既知の透かし除去攻撃に対して正確で信頼性が高く、安全であると主張されています。業界。このホワイトペーパーでは、パフォーマンスのオーバーヘッドがなく、事前の知識も必要ない、既存のホワイトボックス透かし方式のほとんどすべてをクラックする最初の効果的な除去攻撃を紹介します。ニューロンの粒度でこれらのIP保護メカニズムを分析することにより、ローカルニューロングループの脆弱な機能のセットへの共通の依存性を初めて発見します。これらはすべて、提案された不変ニューロン変換のチェーンによって任意に改ざんされる可能性があります。 9つの最先端のホワイトボックス透かし方式と業界レベルのDNNアーキテクチャの幅広いセットで、私たちの攻撃は初めて、保護されたモデルに埋め込まれたIDメッセージをほぼランダムに減らします。一方、既知の除去攻撃とは異なり、私たちの攻撃はトレーニングデータの配布や採用された透かしアルゴリズムに関する事前の知識を必要とせず、モデルの機能をそのまま残します。

Recently, how to protect the Intellectual Property (IP) of deep neural networks (DNN) becomes a major concern for the AI industry. To combat potential model piracy, recent works explore various watermarking strategies to embed secret identity messages into the prediction behaviors or the internals (e.g., weights and neuron activation) of the target model. Sacrificing less functionality and involving more knowledge about the target model, the latter branch of watermarking schemes (i.e., white-box model watermarking) is claimed to be accurate, credible and secure against most known watermark removal attacks, with emerging research efforts and applications in the industry. In this paper, we present the first effective removal attack which cracks almost all the existing white-box watermarking schemes with provably no performance overhead and no required prior knowledge. By analyzing these IP protection mechanisms at the granularity of neurons, we for the first time discover their common dependence on a set of fragile features of a local neuron group, all of which can be arbitrarily tampered by our proposed chain of invariant neuron transforms. On 9 state-of-the-art white-box watermarking schemes and a broad set of industry-level DNN architectures, our attack for the first time reduces the embedded identity message in the protected models to be almost random. Meanwhile, unlike known removal attacks, our attack requires no prior knowledge on the training data distribution or the adopted watermark algorithms, and leaves model functionality intact.

updated: Thu May 19 2022 07:28:53 GMT+0000 (UTC)

published: Sat Apr 30 2022 08:33:32 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト