Visual concepts (e.g., red apple, big elephant) are often semantically compositional and each element of the compositions can be reused to construct novel concepts (e.g., red elephant). Compositional feature synthesis, which generates image feature distributions exploiting the semantic compositionality, is a promising approach to sample-efficient model generalization. In this work, we propose a task-aware feature generation (TFG) framework for compositional learning, which generates features of novel visual concepts by transferring knowledge from previously seen concepts. These synthetic features are then used to train a classifier to recognize novel concepts in a zero-shot manner. Our novel TFG design injects task-conditioned noise layer-by-layer, producing task-relevant variation at each level. We find the proposed generator design improves classification accuracy and sample efficiency. Our model establishes a new state of the art on three zero-shot compositional learning (ZSCL) benchmarks, outperforming the previous discriminative models by a large margin. Our model improves the performance of the prior arts by over 2x in the generalized ZSCL setting.
updated: Mon Mar 23 2020 23:57:41 GMT+0000 (UTC)
published: Tue Jun 11 2019 23:00:43 GMT+0000 (UTC)