CAGAN: Text-To-Image Generation with Combined Attention GANs

Henning Schulze; Dogucan Yaman; Alexander Waibel

CAGAN：アテンションGANを組み合わせたテキストから画像への生成

自然言語の説明に従って画像を生成することは、困難な作業です。この作業では、テキストの説明に従って写実的な画像を生成するために、Combined Attention Generative Adversarial Network（CAGAN）を提案します。提案されたCAGANは、2つの注意モデルを利用します。関連する単語を条件とするさまざまなサブ領域を描画するための単語の注意。チャネル間の非線形相互作用をキャプチャするためのスクイーズと励起の注意。トレーニングを安定させるためのスペクトル正規化により、提案されたCAGANは、CUBデータセットのISとFID、およびより困難なCOCOデータセットのFIDの最先端を改善します。さらに、単一の評価指標でモデルを判断することは、より高いISをスコアリングし、CUBデータセットの最新技術を上回り、特徴の繰り返しによって非現実的な画像を生成するローカル自己注意を追加する追加モデルを開発することにより、誤解を招く可能性があることを示します。

Generating images according to natural language descriptions is a challenging task. In this work, we propose the Combined Attention Generative Adversarial Network (CAGAN) to generate photo-realistic images according to textual descriptions. The proposed CAGAN utilises two attention models: word attention to draw different sub-regions conditioned on related words; and squeeze-and-excitation attention to capture non-linear interaction among channels. With spectral normalisation to stabilise training, our proposed CAGAN improves the state of the art on the IS and FID on the CUB dataset and the FID on the more challenging COCO dataset. Furthermore, we demonstrate that judging a model by a single evaluation metric can be misleading by developing an additional model adding local self-attention which scores a higher IS, outperforming the state of the art on the CUB dataset, but generates unrealistic images through feature repetition.

updated: Mon Apr 26 2021 15:46:40 GMT+0000 (UTC)

published: Mon Apr 26 2021 15:46:40 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト