Text Semantics to Image Generation: A method of building facades design base on Stable Diffusion model

Haoran Ma

テキストセマンティクスから画像生成へ: 安定拡散モデルに基づいてファサードデザインを構築する方法

安定拡散モデルは、建築画像生成の研究で広く採用されていますが、生成された画像コンテンツの制御性に関しては、強化する機会がまだあります。マルチネットワーク結合テキストから建物のファサード画像生成方法を本研究で提案した。最初に LoRA (Low-Rank Adaptation) アプローチを使用して CMP Fa-cades データセットの Stable Diffusion モデルを微調整し、次に ControlNet モデルを適用して出力をさらに制御しました。最後に、さまざまなアーキテクチャスタイルのテキストコンテンツと制御戦略の下で、ファサード生成結果を対比しました。結果は、LoRA トレーニングアプローチが Stable Dif-fusion 大規模モデルの微調整の可能性を大幅に減少させ、ControlNet モデルの追加により、建物のファサードイメージへのテキスト作成の制御性が向上することを示しています。これは、建築イメージの生成に関するその後の研究の基礎を提供します。

Stable Diffusion model has been extensively employed in the study of archi-tectural image generation, but there is still an opportunity to enhance in terms of the controllability of the generated image content. A multi-network combined text-to-building facade image generating method is proposed in this work. We first fine-tuned the Stable Diffusion model on the CMP Fa-cades dataset using the LoRA (Low-Rank Adaptation) approach, then we ap-ply the ControlNet model to further control the output. Finally, we contrast-ed the facade generating outcomes under various architectural style text con-tents and control strategies. The results demonstrate that the LoRA training approach significantly decreases the possibility of fine-tuning the Stable Dif-fusion large model, and the addition of the ControlNet model increases the controllability of the creation of text to building facade images. This pro-vides a foundation for subsequent studies on the generation of architectural images.

updated: Thu Mar 23 2023 01:20:55 GMT+0000 (UTC)

published: Thu Feb 23 2023 14:03:07 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト