Intriguing Properties of Text-guided Diffusion Models

Qihao Liu; Adam Kortylewski; Yutong Bai; Song Bai; Alan Yuille

テキストガイドによる拡散モデルの興味深い特性

テキストガイド付き拡散モデル (TDM) は広く適用されていますが、予期せず失敗する可能性があります。一般的な失敗には、(i) 自然に見えるテキストプロンプトが間違ったコンテンツの画像を生成する、または (ii) 同じテキストプロンプトで条件付けされているにもかかわらず、大きく異なる、さらには無関係な出力を生成する潜在変数の異なるランダムサンプルが含まれます。この研究では、TDM の故障モードをより詳細に研究し、理解することを目的としています。これを達成するために、我々は、TDM に対する敵対的攻撃である SAGE を提案します。これは、画像分類子を代理損失関数として使用し、TDM の離散プロンプト空間と高次元潜在空間を検索して、画像生成における予期せぬ動作や失敗ケースを自動的に発見します。。私たちは、SAGE が分類器ではなく拡散モデルの失敗例を確実に見つけられるように、いくつかの技術的貢献を行い、これを人間の研究で検証します。私たちの研究では、これまで体系的に研究されていなかった TDM の 4 つの興味深い特性が明らかになりました。 (1) 入力テキストのセマンティクスをキャプチャできない画像を生成するさまざまな自然テキストプロンプトが見つかりました。私たちは、これらの障害を根本的な原因に基づいて 10 の異なるタイプに分類します。 (2) 潜在空間内で、テキストプロンプトとは無関係に歪んだ画像を引き起こすサンプル (外れ値ではない) が見つかりました。これは、潜在空間の一部が適切に構造化されていないことを示唆しています。 (3) また、テキストプロンプトとは関係のない、自然に見える画像につながる潜在サンプルも見つかりました。これは、潜在スペースとプロンプトスペースの間の潜在的な不整合を示唆しています。 (4) 単一の敵対的トークン埋め込みを入力プロンプトに追加することにより、CLIP スコアへの影響を最小限に抑えながら、指定されたさまざまなターゲットオブジェクトを生成できます。これは言語表現の脆弱性を示しており、潜在的な安全上の懸念を引き起こします。プロジェクトページ：https://sage-diffusion.github.io/

Text-guided diffusion models (TDMs) are widely applied but can fail unexpectedly. Common failures include: (i) natural-looking text prompts generating images with the wrong content, or (ii) different random samples of the latent variables that generate vastly different, and even unrelated, outputs despite being conditioned on the same text prompt. In this work, we aim to study and understand the failure modes of TDMs in more detail. To achieve this, we propose SAGE, an adversarial attack on TDMs that uses image classifiers as surrogate loss functions, to search over the discrete prompt space and the high-dimensional latent space of TDMs to automatically discover unexpected behaviors and failure cases in the image generation. We make several technical contributions to ensure that SAGE finds failure cases of the diffusion model, rather than the classifier, and verify this in a human study. Our study reveals four intriguing properties of TDMs that have not been systematically studied before: (1) We find a variety of natural text prompts producing images that fail to capture the semantics of input texts. We categorize these failures into ten distinct types based on the underlying causes. (2) We find samples in the latent space (which are not outliers) that lead to distorted images independent of the text prompt, suggesting that parts of the latent space are not well-structured. (3) We also find latent samples that lead to natural-looking images which are unrelated to the text prompt, implying a potential misalignment between the latent and prompt spaces. (4) By appending a single adversarial token embedding to an input prompt we can generate a variety of specified target objects, while only minimally affecting the CLIP score. This demonstrates the fragility of language representations and raises potential safety concerns. Project page: https://sage-diffusion.github.io/

updated: Sat Aug 19 2023 21:38:53 GMT+0000 (UTC)

published: Thu Jun 01 2023 17:59:00 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト