DreamArtist: Towards Controllable One-Shot Text-to-Image Generation via Contrastive Prompt-Tuning

Ziyi Dong; Pengxu Wei; Liang Lin

DreamArtist: 対照的なプロンプトチューニングによる制御可能なワンショットテキストから画像への生成に向けて

指数関数的な進化を伴う大規模なテキストから画像への生成モデルは、現在、テキストガイダンスに基づいて、高解像度、機能豊富、高品質の画像を合成できます。しかし、常に出現する新しい概念、スタイル、またはオブジェクトエンティティの言葉に圧倒されることがよくあります。微調整または迅速な調整方法を使用して、特定の参照画像セットから新しい疑似単語としてモデルに新しい概念を教える最近の試みがいくつかありますが、これらの方法は、多様で高品質の画像を合成することがまだ難しいだけではありません。歪みやアーティファクトのない画像が得られますが、制御性が低いという欠点もあります。これらの問題に対処するために、正と負の両方の埋め込みを疑似単語として導入し、それらを共同でトレーニングする、対照的なプロンプトチューニングの学習戦略を採用する DreamArtist メソッドを提案します。正の埋め込みは参照画像の特徴を積極的に学習してモデルの多様な生成を推進しますが、負の埋め込みは逆に正の埋め込みからの間違いや不備を修正するために自己監視された方法で内省します。何が正しいかだけでなく、何を避けるべきかを学習します。画質と多様性の分析、制御性分析、モデル学習分析、およびタスク拡張に関する広範な実験により、モデルが概念だけでなく、形式、内容、およびコンテキストも学習することが実証されました。 DreamArtist の疑似単語は、真の単語と同様の特性を持ち、高品質の画像を生成します。

Large-scale text-to-image generation models with an exponential evolution can currently synthesize high-resolution, feature-rich, high-quality images based on text guidance. However, they are often overwhelmed by words of new concepts, styles, or object entities that always emerge. Although there are some recent attempts to use fine-tuning or prompt-tuning methods to teach the model a new concept as a new pseudo-word from a given reference image set, these methods are not only still difficult to synthesize diverse and high-quality images without distortion and artifacts, but also suffer from low controllability. To address these problems, we propose a DreamArtist method that employs a learning strategy of contrastive prompt-tuning, which introduces both positive and negative embeddings as pseudo-words and trains them jointly. The positive embedding aggressively learns characteristics in the reference image to drive the model diversified generation, while the negative embedding introspects in a self-supervised manner to rectify the mistakes and inadequacies from positive embedding in reverse. It learns not only what is correct but also what should be avoided. Extensive experiments on image quality and diversity analysis, controllability analysis, model learning analysis and task expansion have demonstrated that our model learns not only concept but also form, content and context. Pseudo-words of DreamArtist have similar properties as true words to generate high-quality images.

updated: Mon Nov 21 2022 10:37:56 GMT+0000 (UTC)

published: Mon Nov 21 2022 10:37:56 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト