Many unsupervised approaches have been proposed recently for the video-based re-identification problem since annotations of samples across cameras are time-consuming. However, higher-order relationships across the entire camera network are ignored by these methods, leading to contradictory outputs when matching results from different camera pairs are combined. In this paper, we address the problem of unsupervised video-based re-identification by proposing a consistent cross-view matching (CCM) framework, in which global camera network constraints are exploited to guarantee the matched pairs are with consistency. Specifically, we first propose to utilize the first neighbor of each sample to discover relations among samples and find the groups in each camera. Additionally, a cross-view matching strategy followed by global camera network constraints is proposed to explore the matching relationships across the entire camera network. Finally, we learn metric models for camera pairs progressively by alternatively mining consistent cross-view matching pairs and updating metric models using these obtained matches. Rigorous experiments on two widely-used benchmarks for video re-identification demonstrate the superiority of the proposed method over current state-of-the-art unsupervised methods; for example, on the MARS dataset, our method achieves an improvement of 4.2% over unsupervised methods, and even 2.5% over one-shot supervision-based methods for rank-1 accuracy.
updated: Sat Dec 12 2020 04:47:42 GMT+0000 (UTC)
published: Tue Aug 27 2019 22:35:43 GMT+0000 (UTC)