GR-GAN: Gradual Refinement Text-to-image Generation

Bo Yang; Fangxiang Feng; Xiaojie Wang

GR-GAN：テキストから画像への段階的な改良

優れたText-to-Imageモデルは、高品質の画像を生成するだけでなく、テキストと生成された画像の間の一貫性を確保する必要があります。以前のモデルでは、両側を同時にうまく修正できませんでした。この論文は、問題を効率的に軽減するための段階的改良生成的敵対的ネットワーク（GR-GAN）を提案します。 GRGモジュールは、低解像度から高解像度までの画像を生成するように設計されており、対応するテキストの制約は、段階ごとに粗い粒度（文）から細かい粒度（単語）まであります。ITMモジュールは、両方の文で画像とテキストの一致損失を提供するように設計されています。 -対応するステージの画像レベルと単語領域レベル。また、画質と画像とテキストの一貫性を同時に評価するための新しいメトリッククロスモデル距離（CMD）を紹介します。実験結果は、GR-GANが以前のモデルを大幅に上回り、FIDとCMDの両方で新しい最先端技術を実現していることを示しています。詳細な分析は、GR-GANのさまざまな生成段階の効率を示しています。

A good Text-to-Image model should not only generate high quality images, but also ensure the consistency between the text and the generated image. Previous models failed to simultaneously fix both sides well. This paper proposes a Gradual Refinement Generative Adversarial Network (GR-GAN) to alleviates the problem efficiently. A GRG module is designed to generate images from low resolution to high resolution with the corresponding text constraints from coarse granularity (sentence) to fine granularity (word) stage by stage, a ITM module is designed to provide image-text matching losses at both sentence-image level and word-region level for corresponding stages. We also introduce a new metric Cross-Model Distance (CMD) for simultaneously evaluating image quality and image-text consistency. Experimental results show GR-GAN significant outperform previous models, and achieve new state-of-the-art on both FID and CMD. A detailed analysis demonstrates the efficiency of different generation stages in GR-GAN.

updated: Tue Jun 21 2022 16:31:57 GMT+0000 (UTC)

published: Mon May 23 2022 12:42:04 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト