Recent studies in image retrieval task have shown that ensembling different models and combining multiple global descriptors lead to performance improvement. However, training different models for the ensemble is not only difficult but also inefficient with respect to time and memory. In this paper, we propose a novel framework that exploits multiple global descriptors to get an ensemble effect while it can be trained in an end-to-end manner. The proposed framework is flexible and expandable by the global descriptor, CNN backbone, loss, and dataset. Moreover, we investigate the effectiveness of combining multiple global descriptors with quantitative and qualitative analysis. Our extensive experiments show that the combined descriptor outperforms a single global descriptor, as it can utilize different types of feature properties. In the benchmark evaluation, the proposed framework achieves the state-of-the-art performance on the CARS196, CUB200-2011, In-shop Clothes, and Stanford Online Products on image retrieval tasks. Our model implementations and pretrained models are publicly available.
updated: Thu Apr 23 2020 06:20:02 GMT+0000 (UTC)
published: Tue Mar 26 2019 03:38:38 GMT+0000 (UTC)