Diverse Image Captioning with Context-Object Split Latent Spaces

Shweta Mahajan; Stefan Roth

コンテキストオブジェクト分割潜在空間を使用した多様な画像キャプション

多様な画像キャプションモデルは、画像やテキストなどのクロスドメインデータセットに固有の1対多のマッピングを学習することを目的としています。このタスクの現在の方法は、生成的潜在変数モデル、たとえば構造化された潜在空間を持つVAEに基づいています。それでも、以前の作業によってキャプチャされたマルチモダリティの量は、ペアのトレーニングデータの量に制限されています-基礎となる生成プロセスの真の多様性は完全にはキャプチャされていません。この制限に対処するために、さまざまなビジュアルシーンで同様のコンテキストを説明するデータセット内のコンテキスト記述を活用します。この目的のために、データセット内の画像とテキスト全体のコンテキスト記述の多様性をモデル化するために、コンテキストオブジェクト分割と呼ばれる潜在空間の新しい因数分解を導入します。私たちのフレームワークは、コンテキストベースの疑似監視を通じて多様なキャプションを可能にするだけでなく、これを新しいオブジェクトがあり、トレーニングデータにペアのキャプションがない画像に拡張します。 COS-CVAEアプローチを、標準のCOCOデータセットと、新しいオブジェクトを含む画像で構成される保持されたCOCOデータセットで評価し、精度と多様性が大幅に向上していることを示しています。

Diverse image captioning models aim to learn one-to-many mappings that are innate to cross-domain datasets, such as of images and texts. Current methods for this task are based on generative latent variable models, e.g. VAEs with structured latent spaces. Yet, the amount of multimodality captured by prior work is limited to that of the paired training data -- the true diversity of the underlying generative process is not fully captured. To address this limitation, we leverage the contextual descriptions in the dataset that explain similar contexts in different visual scenes. To this end, we introduce a novel factorization of the latent space, termed context-object split, to model diversity in contextual descriptions across images and texts within the dataset. Our framework not only enables diverse captioning through context-based pseudo supervision, but extends this to images with novel objects and without paired captions in the training data. We evaluate our COS-CVAE approach on the standard COCO dataset and on the held-out COCO dataset consisting of images with novel objects, showing significant gains in accuracy and diversity.

updated: Mon Nov 02 2020 13:33:20 GMT+0000 (UTC)

published: Mon Nov 02 2020 13:33:20 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト