We introduce a simple yet effective episode-based training framework for zero-shot learning (ZSL), where the learning system requires to recognize unseen classes given only the corresponding class semantics. During training, the model is trained within a collection of episodes, each of which is designed to simulate a zero-shot classification task. Through training multiple episodes, the model progressively accumulates ensemble experiences on predicting the mimetic unseen classes, which will generalize well on the real unseen classes. Based on this training framework, we propose a novel generative model that synthesizes visual prototypes conditioned on the class semantic prototypes. The proposed model aligns the visual-semantic interactions by formulating both the visual prototype generation and the class semantic inference into an adversarial framework paired with a parameter-economic Multi-modal Cross-Entropy Loss to capture the discriminative information. Extensive experiments on four datasets under both traditional ZSL and generalized ZSL tasks show that our model outperforms the state-of-the-art approaches by large margins.
updated: Thu Apr 02 2020 03:21:04 GMT+0000 (UTC)
published: Sun Sep 08 2019 01:06:15 GMT+0000 (UTC)