FA-GAN: Feature-Aware GAN for Text to Image Synthesis

Eunyeong Jeon; Kunhee Kim; Daijin Kim

FA-GAN：テキストから画像への合成のための特徴認識GAN

テキストから画像への合成は、特定の自然言語の説明から写実的な画像を生成することを目的としています。以前の作業は、Generative Adversarial Networks（GAN）で大きな進歩を遂げました。それでも、無傷のオブジェクトやクリアなテクスチャを生成することは依然として困難です（図1）。この問題に対処するために、機能認識生成的敵対的ネットワーク（FA-GAN）を提案し、自己監視型弁別器と機能認識損失の2つの手法を統合して高品質の画像を合成します。まず、補助デコーダーを備えた自己監視型ディスクリミネーターを設計して、ディスクリミネーターがより適切な表現を抽出できるようにします。次に、特徴認識損失を導入して、自己監視あり弁別器からの特徴表現を使用することにより、ジェネレーターにより直接的な監視を提供します。 MS-COCOデータセットでの実験は、提案された方法が最先端のFIDスコアを28.92から24.58に大幅に向上させることを示しています。

Text-to-image synthesis aims to generate a photo-realistic image from a given natural language description. Previous works have made significant progress with Generative Adversarial Networks (GANs). Nonetheless, it is still hard to generate intact objects or clear textures (Fig 1). To address this issue, we propose Feature-Aware Generative Adversarial Network (FA-GAN) to synthesize a high-quality image by integrating two techniques: a self-supervised discriminator and a feature-aware loss. First, we design a self-supervised discriminator with an auxiliary decoder so that the discriminator can extract better representation. Secondly, we introduce a feature-aware loss to provide the generator more direct supervision by employing the feature representation from the self-supervised discriminator. Experiments on the MS-COCO dataset show that our proposed method significantly advances the state-of-the-art FID score from 28.92 to 24.58.

updated: Thu Sep 02 2021 13:05:36 GMT+0000 (UTC)

published: Thu Sep 02 2021 13:05:36 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト