DreamArtist: Towards Controllable One-Shot Text-to-Image Generation via Contrastive Prompt-Tuning

Ziyi Dong; Pengxu Wei; Liang Lin

DreamArtist: 対照的なプロンプトチューニングによる制御可能なワンショットテキストから画像への生成に向けて

大規模なテキストから画像への生成モデルは、テキストに導かれた高解像度の高品質で機能豊富な画像の合成において目覚ましい進歩を遂げました。ただし、これらのモデルは、新しいスタイル、オブジェクトエンティティなどの新しい概念に苦労することがよくあります。特にワンショットアプリケーションでは、生成の制御性を維持しながら多様で高品質の画像を生成するのに有害です。この課題に取り組むために、DreamArtist と呼ばれるシンプルで効果的な方法を紹介します。これは、正負のプロンプトチューニング学習戦略を採用しています。具体的には、DreamArtist は正と負の両方の埋め込みを組み込み、それらを共同でトレーニングします。正の埋め込みは参照画像の顕著な特徴を積極的に捉えて多様な生成を促進し、負の埋め込みは正の埋め込みの不備を修正します。何が正しいかだけでなく、何を回避または改善できるかを学習します。広範な実験を行い、画像の類似性と多様性、生成制御性、およびスタイルクローニングから提案手法を評価しました。そして、当社の DreamArtist は、既存の方法よりも優れた生成パフォーマンスを実現しました。さらに、コンセプト構成や迅速なガイド付き画像編集などの拡張タスクに関する追加の評価は、より多くのアプリケーションでの有効性を示しています。

Large-scale text-to-image generation models have achieved remarkable progress in synthesizing high-quality, feature-rich images with high resolution guided by texts. However, these models often struggle with novel concepts, eg, new styles, object entities, etc. Although recent attempts have employed fine-tuning or prompt-tuning strategies to teach the pre-trained diffusion model novel concepts from a reference image set,they have the drawback of overfitting to the given reference images, particularly in one-shot applications, which is harmful to generate diverse and high-quality images while maintaining generation controllability. To tackle this challenge, we present a simple yet effective method called DreamArtist, which employs a positive-negative prompt-tuning learning strategy. Specifically, DreamArtist incorporates both positive and negative embeddings and jointly trains them. The positive embedding aggressively captures the salient characteristics of the reference image to drive diversified generation and the negative embedding rectifies inadequacies from the positive embedding. It learns not only what is correct, but also what can be avoided or improved. We have conducted extensive experiments and evaluated the proposed method from image similarity and diversity, generation controllability, and style cloning. And our DreamArtist has achieved a superior generation performance over existing methods. Besides, our additional evaluation on extended tasks, including concept compositions and prompt-guided image editing, demonstrates its effectiveness for more applications.

updated: Thu Mar 16 2023 07:38:02 GMT+0000 (UTC)

published: Mon Nov 21 2022 10:37:56 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト