Optimizing Prompts for Text-to-Image Generation

Yaru Hao; Zewen Chi; Li Dong; Furu Wei

テキストから画像への生成のためのプロンプトの最適化

適切に設計されたプロンプトは、テキストから画像へのモデルを導き、素晴らしい画像を生成することができます。ただし、パフォーマンスプロンプトは多くの場合、モデル固有であり、ユーザー入力と一致していません。骨の折れる人間工学の代わりに、プロンプト適応を提案します。これは、元のユーザー入力をモデルが好むプロンプトに自動的に適応させる一般的なフレームワークです。具体的には、最初に、手動で設計されたプロンプトの小さなコレクションに対して、事前トレーニング済みの言語モデルを使用して監視付きの微調整を実行します。次に、強化学習を使用してより良いプロンプトを探索します。元のユーザーの意図を維持しながら、より美的に魅力的な画像を生成するポリシーを奨励する報酬関数を定義します。 Stable Diffusion に関する実験結果は、自動メトリックと人間の好みの評価の両方の点で、手動の迅速なエンジニアリングよりも優れていることを示しています。さらに、強化学習は、特にドメイン外のプロンプトで、パフォーマンスをさらに向上させます。事前トレーニング済みのチェックポイントは、https://aka.ms/promptist で入手できます。デモは https://aka.ms/promptist-demo にあります。

Well-designed prompts can guide text-to-image models to generate amazing images. However, the performant prompts are often model-specific and misaligned with user input. Instead of laborious human engineering, we propose prompt adaptation, a general framework that automatically adapts original user input to model-preferred prompts. Specifically, we first perform supervised fine-tuning with a pretrained language model on a small collection of manually engineered prompts. Then we use reinforcement learning to explore better prompts. We define a reward function that encourages the policy to generate more aesthetically pleasing images while preserving the original user intentions. Experimental results on Stable Diffusion show that our method outperforms manual prompt engineering in terms of both automatic metrics and human preference ratings. Moreover, reinforcement learning further boosts performance, especially on out-of-domain prompts. The pretrained checkpoints are available at https://aka.ms/promptist. The demo can be found at https://aka.ms/promptist-demo.

updated: Mon Dec 19 2022 16:50:41 GMT+0000 (UTC)

published: Mon Dec 19 2022 16:50:41 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト