AI Illustrator: Translating Raw Descriptions into Images by Prompt-based Cross-Modal Generation

Yiyang Ma; Huan Yang; Bei Liu; Jianlong Fu; Jiaying Liu

AI イラストレーター: プロンプトベースのクロスモーダル生成による未加工の説明の画像への変換

AI illustrator は、豊かな思考と感情を刺激する視覚的に魅力的な書籍の画像を自動的にデザインすることを目指しています。この目標を達成するために、複雑なセマンティクスを持つ生の説明を意味的に対応する画像に変換するためのフレームワークを提案します。主な課題は、生の記述のセマンティクスの複雑さにあり、視覚化するのが難しい場合があります (たとえば、「暗い」または「アジア」)。通常、このような記述を処理する既存の方法には課題があります。この問題に対処するために、CLIP と StyleGAN を含む 2 つの強力な事前トレーニング済みモデルを活用するプロンプトベースのクロスモーダル生成フレームワーク (PCM-Frame) を提案します。私たちのフレームワークは、プロンプトに基づくテキスト埋め込みから画像埋め込みへのプロジェクションモジュールと、入力として画像埋め込みを受け取り、結合されたセマンティック一貫性損失によってトレーニングされる StyleGAN 上に構築された適応画像生成モジュールの 2 つのコンポーネントで構成されます。現実的な画像とイラストデザインの間のギャップを埋めるために、より優れた視覚効果を得るために、フレームワークの後処理としてスタイライゼーションモデルをさらに採用しています。事前にトレーニングされたモデルの恩恵を受けて、私たちの方法は複雑な記述を処理でき、トレーニングに外部のペアデータを必要としません。さらに、200 の生の説明からなるベンチマークを構築しました。複雑なテキストを使用する競合する方法に対する優位性を示すために、ユーザー調査を実施します。 https://github.com/researchmm/AI_Illustrator でコードをリリースしています。

AI illustrator aims to automatically design visually appealing images for books to provoke rich thoughts and emotions. To achieve this goal, we propose a framework for translating raw descriptions with complex semantics into semantically corresponding images. The main challenge lies in the complexity of the semantics of raw descriptions, which may be hard to be visualized (e.g., "gloomy" or "Asian"). It usually poses challenges for existing methods to handle such descriptions. To address this issue, we propose a Prompt-based Cross-Modal Generation Framework (PCM-Frame) to leverage two powerful pre-trained models, including CLIP and StyleGAN. Our framework consists of two components: a projection module from Text Embeddings to Image Embeddings based on prompts, and an adapted image generation module built on StyleGAN which takes Image Embeddings as inputs and is trained by combined semantic consistency losses. To bridge the gap between realistic images and illustration designs, we further adopt a stylization model as post-processing in our framework for better visual effects. Benefiting from the pre-trained models, our method can handle complex descriptions and does not require external paired data for training. Furthermore, we have built a benchmark that consists of 200 raw descriptions. We conduct a user study to demonstrate our superiority over the competing methods with complicated texts. We release our code at https://github.com/researchmm/AI_Illustrator.

updated: Thu Sep 08 2022 04:24:35 GMT+0000 (UTC)

published: Wed Sep 07 2022 13:53:54 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト