GlyphDraw: Seamlessly Rendering Text with Intricate Spatial Structures in Text-to-Image Generation

Jian Ma; Mingjun Zhao; Chen Chen; Ruichen Wang; Di Niu; Haonan Lu; Xiaodong Lin

GlyphDraw: テキストから画像への生成で複雑な空間構造を持つテキストをシームレスにレンダリング

言語ガイド付き画像生成の分野における最近の進歩は目覚ましい成果を上げ、ユーザーの指示に基づいて高品質で多様な画像を作成できるようになりました。合成パフォーマンスは魅力的ですが、現在の画像生成モデルの大きな制限の 1 つは、その能力が不十分であることです。特に漢字のような複雑なグリフ構造の場合、画像内で一貫したテキストを生成します。この問題に対処するために、特定の言語のテキストが一貫して埋め込まれた画像を生成する機能を画像生成モデルに与えることを目的とした一般的な学習フレームワークである GlyphDraw を導入します。最初に画像テキストデータセットの構築戦略を高度に設計し、次にモデルを構築します。特に拡散ベースの画像ジェネレーター上でネットワーク構造を慎重に変更し、モデルがグリフと位置情報の助けを借りて描画言語の文字を学習できるようにします。さらに、次の方法を使用して壊滅的な忘却を防ぐことで、モデルのオープンドメイン画像合成機能を維持します。パラメータ効率の良い微調整技術。広範な定性的および定量的な実験により、私たちの方法がプロンプトのように正確な言語文字を生成するだけでなく、生成されたテキストを背景にシームレスにブレンドすることも実証されています。 https://1073521013.github を参照してください。 io/glyph-draw.github.io/プロジェクトページ。概要

Recent breakthroughs in the field of language-guided image generation have yielded impressive achievements, enabling the creation of high-quality and diverse images based on user instructions.Although the synthesis performance is fascinating, one significant limitation of current image generation models is their insufficient ability to generate text coherently within images, particularly for complex glyph structures like Chinese characters. To address this problem, we introduce GlyphDraw, a general learning framework aiming to endow image generation models with the capacity to generate images coherently embedded with text for any specific language.We first sophisticatedly design the image-text dataset's construction strategy, then build our model specifically on a diffusion-based image generator and carefully modify the network structure to allow the model to learn drawing language characters with the help of glyph and position information.Furthermore, we maintain the model's open-domain image synthesis capability by preventing catastrophic forgetting by using parameter-efficient fine-tuning techniques.Extensive qualitative and quantitative experiments demonstrate that our method not only produces accurate language characters as in prompts, but also seamlessly blends the generated text into the background.Please refer to our https://1073521013.github.io/glyph-draw.github.io/project page. abstract

updated: Tue May 23 2023 04:07:00 GMT+0000 (UTC)

published: Fri Mar 31 2023 08:06:33 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト