Generalized Zero-Shot Learning using Multimodal Variational Auto-Encoder with Semantic Concepts

Nihar Bendre; Kevin Desai; Peyman Najafirad

セマンティックコンセプトを備えたマルチモーダル変分オートエンコーダを使用した一般化されたゼロショット学習

データ量が増え続ける中、マルチモーダル学習の中心的な課題には、ラベル付けされたサンプルの制限が含まれます。分類のタスクでは、メタ学習、ゼロショット学習、数ショット学習などの手法が、事前の知識に基づいて新しいクラスに関する情報を学習する機能を示します。最近の技術は、意味空間と画像空間の間のクロスモーダルマッピングを学習しようとしています。ただし、ローカルおよびグローバルのセマンティック知識を無視する傾向があります。この問題を克服するために、画像特徴の共有潜在空間と意味空間を学習できるマルチモーダル変分オートエンコーダ（M-VAE）を提案します。私たちのアプローチでは、潜在空間を学習するためにVAEに渡す前に、マルチモーダルデータを単一の埋め込みに連結します。デコーダを介して埋め込まれた機能の再構築中にマルチモーダル損失を使用することを提案します。私たちのアプローチは、モダリティを相互に関連付け、新しいサンプル予測のためにローカルおよびグローバルなセマンティック知識を活用することができます。 4つのベンチマークデータセットでMLP分類器を使用した実験結果は、提案されたモデルが、一般化されたゼロショット学習の現在の最先端のアプローチよりも優れていることを示しています。

With the ever-increasing amount of data, the central challenge in multimodal learning involves limitations of labelled samples. For the task of classification, techniques such as meta-learning, zero-shot learning, and few-shot learning showcase the ability to learn information about novel classes based on prior knowledge. Recent techniques try to learn a cross-modal mapping between the semantic space and the image space. However, they tend to ignore the local and global semantic knowledge. To overcome this problem, we propose a Multimodal Variational Auto-Encoder (M-VAE) which can learn the shared latent space of image features and the semantic space. In our approach we concatenate multimodal data to a single embedding before passing it to the VAE for learning the latent space. We propose the use of a multi-modal loss during the reconstruction of the feature embedding through the decoder. Our approach is capable to correlating modalities and exploit the local and global semantic knowledge for novel sample predictions. Our experimental results using a MLP classifier on four benchmark datasets show that our proposed model outperforms the current state-of-the-art approaches for generalized zero-shot learning.

updated: Sat Jun 26 2021 20:08:37 GMT+0000 (UTC)

published: Sat Jun 26 2021 20:08:37 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト