Zero-Shot Contrastive Loss for Text-Guided Diffusion Image Style Transfer

Serin Yang; Hyunmin Hwang; Jong Chul Ye

テキストガイド付き拡散画像スタイル転送のためのゼロショットコントラスト損失

拡散モデルは、テキストガイドによる画像スタイルの転送に大きな期待を寄せていますが、その確率論的な性質のために、スタイルの変換とコンテンツの保存の間にはトレードオフがあります。既存の方法では、計算コストの高い拡散モデルの微調整または追加のニューラルネットワークが必要です。これに対処するために、ここでは、追加の微調整や補助ネットワークを必要としない拡散モデルのゼロショットコントラスト損失を提案します。生成されたサンプルと事前トレーニングされた拡散モデルの元の画像埋め込みとの間のパッチごとのコントラスト損失を活用することにより、この方法はゼロショット方法でソース画像と同じセマンティックコンテンツを持つ画像を生成できます。私たちのアプローチは、画像スタイルの転送だけでなく、画像から画像への変換と操作についても、コンテンツを保持し、追加のトレーニングを必要とせずに、既存の方法よりも優れています。実験結果は、提案した方法の有効性を検証します。

Diffusion models have shown great promise in text-guided image style transfer, but there is a trade-off between style transformation and content preservation due to their stochastic nature. Existing methods require computationally expensive fine-tuning of diffusion models or additional neural network. To address this, here we propose a zero-shot contrastive loss for diffusion models that doesn't require additional fine-tuning or auxiliary networks. By leveraging patch-wise contrastive loss between generated samples and original image embeddings in the pre-trained diffusion model, our method can generate images with the same semantic content as the source image in a zero-shot manner. Our approach outperforms existing methods while preserving content and requiring no additional training, not only for image style transfer but also for image-to-image translation and manipulation. Our experimental results validate the effectiveness of our proposed method.

updated: Wed Mar 15 2023 13:47:02 GMT+0000 (UTC)

published: Wed Mar 15 2023 13:47:02 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト