DeltaEdit: Exploring Text-free Training for Text-Driven Image Manipulation

Yueming Lyu; Tianwei Lin; Fu Li; Dongliang He; Jing Dong; Tieniu Tan

DeltaEdit: テキスト駆動型画像操作のためのテキストフリートレーニングの探索

テキスト駆動型の画像操作は、トレーニングや推論の柔軟性において依然として困難です。条件付き生成モデルは、高価な注釈付きトレーニングデータに大きく依存しています。一方、事前トレーニング済みのビジョン言語モデルを活用する最近のフレームワークは、テキストプロンプトごとの最適化または推論時のハイパーパラメーターチューニングのいずれかによって制限されます。この作業では、これらの問題に対処するために、DeltaEdit という新しいフレームワークを提案します。私たちの重要なアイデアは、スペース、つまり、2 つの画像の CLIP 視覚的特徴の違いと、ソーステキストとターゲットテキストの CLIP テキスト埋め込みの違いの間の分布が適切に調整されているデルタ画像とテキストスペースを調査して特定することです。 CLIP デルタスペースに基づいて、DeltaEdit ネットワークは、トレーニングフェーズで CLIP の視覚的特徴の違いを StyleGAN の編集方向にマッピングするように設計されています。次に、推論段階で、DeltaEdit は、CLIP テキストの特徴の違いから StyleGAN の編集方向を予測します。このように、DeltaEdit はテキストなしでトレーニングされます。トレーニングが完了すると、さまざまなテキストプロンプトに一般化して、追加機能なしでゼロショット推論を行うことができます。コードは https://github.com/Yueming6568/DeltaEdit で入手できます。

Text-driven image manipulation remains challenging in training or inference flexibility. Conditional generative models depend heavily on expensive annotated training data. Meanwhile, recent frameworks, which leverage pre-trained vision-language models, are limited by either per text-prompt optimization or inference-time hyper-parameters tuning. In this work, we propose a novel framework named DeltaEdit to address these problems. Our key idea is to investigate and identify a space, namely delta image and text space that has well-aligned distribution between CLIP visual feature differences of two images and CLIP textual embedding differences of source and target texts. Based on the CLIP delta space, the DeltaEdit network is designed to map the CLIP visual features differences to the editing directions of StyleGAN at training phase. Then, in inference phase, DeltaEdit predicts the StyleGAN's editing directions from the differences of the CLIP textual features. In this way, DeltaEdit is trained in a text-free manner. Once trained, it can well generalize to various text prompts for zero-shot inference without bells and whistles. Code is available at https://github.com/Yueming6568/DeltaEdit.

updated: Sat Mar 11 2023 02:38:31 GMT+0000 (UTC)

published: Sat Mar 11 2023 02:38:31 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト