Training-free Style Transfer Emerges from h-space in Diffusion models

Jaeseok Jeong; Mingi Kwon; Youngjung Uh

トレーニング不要のスタイル転送は、拡散モデルの h 空間から出現します

拡散モデル (DM) は、さまざまなドメインで高品質の画像を合成します。しかし、プロセスの中間変数が厳密に研究されていないため、それらの生成プロセスを制御することはまだ曖昧です.最近、DM の StyleCLIP のような編集が、h-space という名前の U-Net のボトルネックに見られます。このホワイトペーパーでは、DM が本質的に結果の画像のコンテンツとスタイルの表現を解きほぐすことを発見しました。h スペースにはコンテンツが含まれ、スキップ接続はスタイルを伝えます。さらに、生成プロセスの進歩的な性質を考慮して、ある画像のコンテンツを別の画像に挿入する原則的な方法を紹介します。簡単に言えば、元の生成プロセスを考えると、1) ソースコンテンツの特徴を徐々にブレンドする必要があります。2) ブレンドされた特徴を正規化して分布を維持する必要があります。3) コンテンツインジェクションによるスキップ接続の変化を調整する必要があります。次に、結果の画像には、画像から画像への変換と同様に、元の画像のスタイルを持つソースコンテンツが含まれます。興味深いことに、目に見えないドメインのスタイルにコンテンツを注入すると、調和のようなスタイルの移行が生じます。私たちの知る限り、私たちの方法は、無条件の事前トレーニング済みの凍結生成ネットワークのみを使用して、最初のトレーニング不要のフィードフォワードスタイルの転送を導入します。コードは https://curryjung.github.io/DiffStyle/ で入手できます。

Diffusion models (DMs) synthesize high-quality images in various domains. However, controlling their generative process is still hazy because the intermediate variables in the process are not rigorously studied. Recently, StyleCLIP-like editing of DMs is found in the bottleneck of the U-Net, named h-space. In this paper, we discover that DMs inherently have disentangled representations for content and style of the resulting images: h-space contains the content and the skip connections convey the style. Furthermore, we introduce a principled way to inject content of one image to another considering progressive nature of the generative process. Briefly, given the original generative process, 1) the feature of the source content should be gradually blended, 2) the blended feature should be normalized to preserve the distribution, 3) the change of skip connections due to content injection should be calibrated. Then, the resulting image has the source content with the style of the original image just like image-to-image translation. Interestingly, injecting contents to styles of unseen domains produces harmonization-like style transfer. To the best of our knowledge, our method introduces the first training-free feed-forward style transfer only with an unconditional pretrained frozen generative network. The code is available at https://curryjung.github.io/DiffStyle/.

updated: Mon Mar 27 2023 17:19:50 GMT+0000 (UTC)

published: Mon Mar 27 2023 17:19:50 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト