Conditional Score Guidance for Text-Driven Image-to-Image Translation

Hyunsoo Lee; Minsoo Kang; Bohyung Han

テキスト駆動型の画像から画像への変換に関する条件付きスコアのガイダンス

我々は、事前訓練されたテキストから画像への拡散モデルに基づいた、テキスト駆動型の画像から画像への変換のための新しいアルゴリズムを提案します。私たちの方法は、残りの部分を保持しながら、変更テキストによって定義されたソース画像内の関心領域を選択的に編集することによってターゲット画像を生成することを目的としています。ターゲットプロンプトのみに依存する既存の手法とは対照的に、特定の翻訳タスクに対処するために調整された、ソースプロンプトとソース画像の両方を考慮する新しいスコア関数を導入します。この目的を達成するために、条件付きスコア関数を原則的な方法で導出し、それを標準スコアとターゲット画像生成のガイド用語に分解します。勾配の計算では、事後分布のガウス分布を採用し、追加のトレーニングを必要とせずにその平均と分散を推定します。さらに、条件付きスコアのガイダンスを強化するために、シンプルかつ効果的なミックスアップ手法を取り入れています。この方法は、ソースとターゲットの潜在から導出された 2 つのクロスアテンションマップを結合し、ソース画像の元の部分とターゲットプロンプトに合わせた編集領域の望ましい融合によってターゲット画像の生成を促進します。包括的な実験を通じて、私たちのアプローチがさまざまなタスクにおいて優れた画像間変換パフォーマンスを達成することを実証します。

We present a novel algorithm for text-driven image-to-image translation based on a pretrained text-to-image diffusion model. Our method aims to generate a target image by selectively editing the regions of interest in a source image, defined by a modifying text, while preserving the remaining parts. In contrast to existing techniques that solely rely on a target prompt, we introduce a new score function, which considers both a source prompt and a source image, tailored to address specific translation tasks. To this end, we derive the conditional score function in a principled manner, decomposing it into a standard score and a guiding term for target image generation. For the gradient computation, we adopt a Gaussian distribution of the posterior distribution, estimating its mean and variance without requiring additional training. In addition, to enhance the conditional score guidance, we incorporate a simple yet effective mixup method. This method combines two cross-attention maps derived from the source and target latents, promoting the generation of the target image by a desirable fusion of the original parts in the source image and the edited regions aligned with the target prompt. Through comprehensive experiments, we demonstrate that our approach achieves outstanding image-to-image translation performance on various tasks.

updated: Mon May 29 2023 10:48:34 GMT+0000 (UTC)

published: Mon May 29 2023 10:48:34 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト