Null-text Guidance in Diffusion Models is Secretly a Cartoon-style Creator

Jing Zhao; Heliang Zheng; Chaoyue Wang; Long Lan; Wanrong Huang; Wenjing Yang

拡散モデルのヌルテキストガイダンスは密かに漫画スタイルのクリエイターです

分類子を使用しないガイダンスは、拡散モデルにおける効果的なサンプリング手法であり、広く採用されています。主なアイデアは、ヌルテキストガイダンスから離れて、テキストガイダンスの方向にモデルを外挿することです。この論文では、拡散モデルにおけるヌルテキストガイダンスが密かに漫画スタイルの作成者であること、つまり、ヌルテキストガイダンスを単に摂動させるだけで、生成された画像を効率的に漫画に変換できることを実証します。具体的には、ヌルテキストガイダンスの予測に使用されるノイズを含む画像とテキストガイダンス（以降、ヌルガイダンスと呼ばれます）間のずれを構築するために、ロールバック障害（Back-D）と画像障害（Image-D）の 2 つの障害方法を提案しました。それぞれ、テキストノイズのある画像とテキストノイズのある画像) をサンプリングプロセスで使用します。 Back-D は、x_t を x_t+Δt に置き換えることにより、ヌルテキストのノイズのある画像のノイズレベルを変更することで漫画化を実現します。 Image-D は、x_t をクリーンな入力画像として定義することで、忠実度の高い多様な漫画を生成します。これにより、画像の詳細の組み込みがさらに改善されます。包括的な実験を通じて、ヌルテキストに対するノイズ妨害の原理を掘り下げ、妨害の有効性がヌルテキストのノイズのある画像とソース画像の間の相関関係に依存することを明らかにしました。さらに、漫画画像を生成し、特定の画像を漫画化できる私たちが提案する技術は、トレーニング不要で、分類子を使用しない誘導拡散モデルのプラグアンドプレイコンポーネントとして簡単に統合できます。プロジェクトページは https://nulltextforcartoon.github.io/ から入手できます。

Classifier-free guidance is an effective sampling technique in diffusion models that has been widely adopted. The main idea is to extrapolate the model in the direction of text guidance and away from null-text guidance. In this paper, we demonstrate that null-text guidance in diffusion models is secretly a cartoon-style creator, i.e., the generated images can be efficiently transformed into cartoons by simply perturbing the null-text guidance. Specifically, we proposed two disturbance methods, i.e., Rollback disturbance (Back-D) and Image disturbance (Image-D), to construct misalignment between the noisy images used for predicting null-text guidance and text guidance (subsequently referred to as null-text noisy image and text noisy image respectively) in the sampling process. Back-D achieves cartoonization by altering the noise level of null-text noisy image via replacing x_t with x_t+Δt. Image-D, alternatively, produces high-fidelity, diverse cartoons by defining x_t as a clean input image, which further improves the incorporation of finer image details. Through comprehensive experiments, we delved into the principle of noise disturbing for null-text and uncovered that the efficacy of disturbance depends on the correlation between the null-text noisy image and the source image. Moreover, our proposed techniques, which can generate cartoon images and cartoonize specific ones, are training-free and easily integrated as a plug-and-play component in any classifier-free guided diffusion model. Project page is available at https://nulltextforcartoon.github.io/.

updated: Thu May 11 2023 10:36:52 GMT+0000 (UTC)

published: Thu May 11 2023 10:36:52 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト