Composition and Deformance: Measuring Imageability with a Text-to-Image Model

Si Wu; David A. Smith

構成と変形: Text-to-Image モデルを使用した画像性の測定

心理言語学者や心理学者は、言語文字列が聞き手や読者に心的イメージを呼び起こす傾向を長い間研究してきましたが、ほとんどの計算研究では、このイメージ可能性の概念は孤立した単語にのみ適用されてきました。 DALLE mini などのテキストから画像への生成モデルの最近の開発を使用して、生成された画像を使用して単一の英単語と接続されたテキストの両方のイメージ可能性を測定する計算手法を提案します。私たちは、人間が生成した画像キャプション、ニュース記事の文、詩の行という 3 つのコーパスから画像生成用のテキストプロンプトをサンプリングします。これらのプロンプトにさまざまな変形を加えて、組成の変化によって引き起こされる画像性の変化を検出するモデルの能力を調べます。我々は、提案されたイメージ可能性の計算的尺度と個々の単語に対する人間の判断との間に高い相関関係があることを発見した。また、提案された対策は、ベースラインのアプローチよりも組成の変化により一貫して対応することがわかりました。モデルトレーニングの考えられる効果と、テキストから画像へのモデルにおける構成性の研究への影響について説明します。

Although psycholinguists and psychologists have long studied the tendency of linguistic strings to evoke mental images in hearers or readers, most computational studies have applied this concept of imageability only to isolated words. Using recent developments in text-to-image generation models, such as DALLE mini, we propose computational methods that use generated images to measure the imageability of both single English words and connected text. We sample text prompts for image generation from three corpora: human-generated image captions, news article sentences, and poem lines. We subject these prompts to different deformances to examine the model's ability to detect changes in imageability caused by compositional change. We find high correlation between the proposed computational measures of imageability and human judgments of individual words. We also find the proposed measures more consistently respond to changes in compositionality than baseline approaches. We discuss possible effects of model training and implications for the study of compositionality in text-to-image models.

updated: Mon Jun 05 2023 18:22:23 GMT+0000 (UTC)

published: Mon Jun 05 2023 18:22:23 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト