Fine-Grained Image Generation from Bangla Text Description using Attentional Generative Adversarial Network

Md Aminul Haque Palash; Md Abdullah Al Nasim; Aditi Dhali; Faria Afrin

注意深い生成的敵対的ネットワークを使用したバングラテキスト記述からのきめ細かい画像生成

テキストからきめの細かいリアルな画像を生成することは、視覚的および意味論的な領域で多くの用途があります。それを考慮して、高解像度のバングラテキストから画像への生成のための強化された多段階処理を可能にするバングラ注意生成敵対的ネットワーク（AttnGAN）を提案します。私たちのモデルは、画像のさまざまなサブ領域で最も具体的な詳細を統合できます。私たちは、自然言語の説明に関連する単語に明確に集中しています。このフレームワークは、CUBデータセットでより良い開始スコアを達成しました。初めて、注意GANを使用して、ベンガル語のテキストからきめの細かい画像が生成されます。ベンガル語は、最も話されている100の言語の中で7番目の位置を獲得しています。これは私たちにこの言語に明確に焦点を合わせるように促し、それは多くの人々の避けられない必要性を確実にするでしょう。さらに、ベンガル語は、より複雑な構文構造と、私たちの作業をより検証する自然言語処理リソースが少なくなっています。

Generating fine-grained, realistic images from text has many applications in the visual and semantic realm. Considering that, we propose Bangla Attentional Generative Adversarial Network (AttnGAN) that allows intensified, multi-stage processing for high-resolution Bangla text-to-image generation. Our model can integrate the most specific details at different sub-regions of the image. We distinctively concentrate on the relevant words in the natural language description. This framework has achieved a better inception score on the CUB dataset. For the first time, a fine-grained image is generated from Bangla text using attentional GAN. Bangla has achieved 7th position among 100 most spoken languages. This inspires us to explicitly focus on this language, which will ensure the inevitable need of many people. Moreover, Bangla has a more complex syntactic structure and less natural language processing resource that validates our work more.

updated: Fri Sep 24 2021 05:31:01 GMT+0000 (UTC)

published: Fri Sep 24 2021 05:31:01 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト