Self-Supervised Image-to-Text and Text-to-Image Synthesis

Anindya Sundar Das; Sriparna Saha

自己監視型の画像からテキストおよびテキストから画像への合成

視覚と言語、およびそれらの相互関係を包括的に理解することは、これらのモダリティ間の根本的な類似点と相違点を認識し、より一般化された意味のある表現を学ぶために重要です。近年、テキストから画像への合成と画像からテキストへの生成に関連するほとんどの作業は、問題を解決するための教師あり生成ディープアーキテクチャに焦点を当てており、埋め込みスペース間の類似性の学習にはほとんど関心がありませんでした。モダリティ。この論文では、クロスモーダル埋め込み空間の学習に向けた、新しい自己監視型深層学習ベースのアプローチを提案します。画像からテキストへの生成とテキストから画像への生成の両方。私たちのアプローチでは、最初にStackGANベースのオートエンコーダモデルを使用して画像の高密度ベクトル表現を取得し、LSTMベースのテキストオートエンコーダを使用して文レベルで高密度ベクトル表現を取得します。次に、GANと最大平均不一致ベースの生成ネットワークを利用して、一方のモダリティの埋め込み空間からもう一方のモダリティの埋め込み空間へのマッピングを研究します。また、モデルが画像データからテキスト記述を生成すること、およびテキストデータから定性的および定量的に画像を生成することを学習することも示します。

A comprehensive understanding of vision and language and their interrelation are crucial to realize the underlying similarities and differences between these modalities and to learn more generalized, meaningful representations. In recent years, most of the works related to Text-to-Image synthesis and Image-to-Text generation, focused on supervised generative deep architectures to solve the problems, where very little interest was placed on learning the similarities between the embedding spaces across modalities. In this paper, we propose a novel self-supervised deep learning based approach towards learning the cross-modal embedding spaces; for both image to text and text to image generations. In our approach, we first obtain dense vector representations of images using StackGAN-based autoencoder model and also dense vector representations on sentence-level utilizing LSTM based text-autoencoder; then we study the mapping from embedding space of one modality to embedding space of the other modality utilizing GAN and maximum mean discrepancy based generative networks. We, also demonstrate that our model learns to generate textual description from image data as well as images from textual data both qualitatively and quantitatively.

updated: Thu Dec 09 2021 13:54:56 GMT+0000 (UTC)

published: Thu Dec 09 2021 13:54:56 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト