Training-Free Location-Aware Text-to-Image Synthesis

Jiafeng Mao; Xueting Wang

トレーニング不要の位置認識テキストから画像への合成

現在の大規模な生成モデルは、テキストプロンプトに基づいて高品質の画像を生成する際に非常に効率的です。ただし、生成された画像内のオブジェクトのサイズと位置を正確に制御する機能はありません。この研究では、安定拡散モデルの生成メカニズムを分析し、ユーザーが追加のトレーニングなしで生成されたオブジェクトの位置を指定できるようにする新しいインタラクティブな生成パラダイムを提案します。さらに、位置認識生成タスクの制御能力を評価するために、オブジェクト検出ベースの評価メトリックを提案します。私たちの実験結果は、私たちの方法が制御能力と画質の両方で最先端の方法よりも優れていることを示しています。

Current large-scale generative models have impressive efficiency in generating high-quality images based on text prompts. However, they lack the ability to precisely control the size and position of objects in the generated image. In this study, we analyze the generative mechanism of the stable diffusion model and propose a new interactive generation paradigm that allows users to specify the position of generated objects without additional training. Moreover, we propose an object detection-based evaluation metric to assess the control capability of location aware generation task. Our experimental results show that our method outperforms state-of-the-art methods on both control capacity and image quality.

updated: Wed Apr 26 2023 10:25:15 GMT+0000 (UTC)

published: Wed Apr 26 2023 10:25:15 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト