The high cost of acquiring labels is one of the main challenges in deploying supervised machine learning algorithms. Active learning is a promising approach to control the learning process and address the difficulties of data labeling by selecting labeled training examples from a large pool of unlabeled instances. In this paper, we propose a new data-driven approach to active learning by choosing a small set of labeled data points that are both informative and representative. To this end, we present an efficient geometric technique to select a diverse core-set in a low-dimensional latent space obtained by training a Variational Autoencoder (VAE). Our experiments demonstrate an improvement in accuracy over two related techniques and, more importantly, signify the representation power of generative modeling for developing new active learning methods in high-dimensional data settings.
updated: Mon Nov 18 2019 15:42:20 GMT+0000 (UTC)
published: Mon Nov 18 2019 15:42:20 GMT+0000 (UTC)