Rickrolling the Artist: Injecting Invisible Backdoors into Text-Guided Image Generation Models

Lukas Struppek; Dominik Hintersdorf; Kristian Kersting

アーティストのリックロール: テキストガイドによる画像生成モデルに目に見えないバックドアを挿入する

テキストから画像への合成は現在、研究者や一般の人々の間で非常に人気がありますが、これらのモデルのセキュリティはこれまで無視されてきました。多くのテキストガイド付き画像生成モデルは、外部ソースからの事前トレーニング済みテキストエンコーダーに依存しており、それらのユーザーは、取得したモデルが約束どおりに動作することを信頼しています。残念ながら、そうではないかもしれません。テキストガイド生成モデルに対するバックドア攻撃を紹介し、そのテキストエンコーダーが大きな改ざんリスクをもたらすことを示します。私たちの攻撃は、エンコーダーをわずかに変更するだけなので、クリーンなプロンプトを使用した画像生成で疑わしいモデルの動作が明らかになることはありません。次に、非ラテン文字をプロンプトに 1 文字挿入することで、敵対者はモデルをトリガーして、事前定義された属性を持つ画像を生成するか、悪意のある可能性のある非表示の説明に従って画像を生成できます。私たちは、Stable Diffusion に対する攻撃の高い有効性を経験的に実証し、1 つのバックドアの注入プロセスに 2 分もかからないことを強調しています。私たちのアプローチを単に攻撃として表現するだけでなく、ヌードや暴力などの特定の概念に関連するフレーズをエンコーダーに強制的に忘れさせ、画像生成をより安全にするのに役立ちます.

While text-to-image synthesis currently enjoys great popularity among researchers and the general public, the security of these models has been neglected so far. Many text-guided image generation models rely on pre-trained text encoders from external sources, and their users trust that the retrieved models will behave as promised. Unfortunately, this might not be the case. We introduce backdoor attacks against text-guided generative models and demonstrate that their text encoders pose a major tampering risk. Our attacks only slightly alter an encoder so that no suspicious model behavior is apparent for image generations with clean prompts. By then inserting a single non-Latin character into the prompt, the adversary can trigger the model to either generate images with pre-defined attributes or images following a hidden, potentially malicious description. We empirically demonstrate the high effectiveness of our attacks on Stable Diffusion and highlight that the injection process of a single backdoor takes less than two minutes. Besides phrasing our approach solely as an attack, it can also force an encoder to forget phrases related to certain concepts, such as nudity or violence, and help to make image generation safer.

updated: Fri Nov 04 2022 12:36:36 GMT+0000 (UTC)

published: Fri Nov 04 2022 12:36:36 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト