A Prompt Log Analysis of Text-to-Image Generation Systems

Yutong Xie; Zhaoying Pan; Jinge Ma; Luo Jie; Qiaozhu Mei

テキストから画像への生成システムの迅速なログ分析

大規模言語モデル (LLM) と生成 AI の最近の開発により、テキストから画像への生成システムの驚くべき機能が解き放たれ、「プロンプト」として知られる特定の参照テキストに忠実な高品質の画像を合成できるようになりました。これらのシステムは、すぐに研究者、クリエイター、および一般ユーザーから多くの注目を集めました。生成モデルを改善するための多くの努力にもかかわらず、これらのシステムのユーザーの情報ニーズを大規模に理解する作業は限られています。複数のテキストから画像への生成システムから収集された大規模なプロンプトログの最初の包括的な分析を行います。私たちの仕事は、ウェブ検索業界と研究の栄光に重要な貢献をしてきた一連の仕事である、ウェブ検索エンジンのクエリログを分析することに似ています。 Web 検索クエリと比較して、テキストから画像へのプロンプトはかなり長く、多くの場合、生成タスクの主題、形式、および意図から構成される特別な構造に編成され、情報ニーズの固有のカテゴリを提示します。ユーザーは作成セッション中にさらに多くの編集を行い、注目すべき探索的パターンを提示します。ユーザー入力プロンプトと、生成モデルのオープントレーニングデータに含まれる画像のキャプションの間にもかなりのギャップがあります。私たちの調査結果は、作成目的でテキストから画像への生成システムを改善する方法に関する具体的な意味を提供します。

Recent developments in large language models (LLM) and generative AI have unleashed the astonishing capabilities of text-to-image generation systems to synthesize high-quality images that are faithful to a given reference text, known as a "prompt". These systems have immediately received lots of attention from researchers, creators, and common users. Despite the plenty of efforts to improve the generative models, there is limited work on understanding the information needs of the users of these systems at scale. We conduct the first comprehensive analysis of large-scale prompt logs collected from multiple text-to-image generation systems. Our work is analogous to analyzing the query logs of Web search engines, a line of work that has made critical contributions to the glory of the Web search industry and research. Compared with Web search queries, text-to-image prompts are significantly longer, often organized into special structures that consist of the subject, form, and intent of the generation tasks and present unique categories of information needs. Users make more edits within creation sessions, which present remarkable exploratory patterns. There is also a considerable gap between the user-input prompts and the captions of the images included in the open training data of the generative models. Our findings provide concrete implications on how to improve text-to-image generation systems for creation purposes.

updated: Thu Mar 16 2023 06:03:08 GMT+0000 (UTC)

published: Wed Mar 08 2023 13:59:41 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト