Few-Shot Deep Adversarial Learning for Video-based Person   Re-identification

Lin Wu; Yang Wang; Hongzhi Yin; Meng Wang; Ling Shao

ビデオベースの人物再識別のためのいくつかのショットの深い敵対的学習

Few-Shot Deep Adversarial Learning for Video-based Person Re-identification

ビデオベースの人物の再識別（re-ID）とは、任意の位置合わせされていないビデオ映像からカメラビュー全体で人物を照合することです。既存の方法は、監視信号に依存して、投影された空間を最適化し、その間でビデオ間/イントラビデオ間の距離が最大化/最小化されます。ただし、これには、カメラビュー全体で人を徹底的にラベル付けする必要があり、大規模なネットワークカメラではスケーリングできないようにします。また、ビュー不変量を使用して効果的なビデオ表現を学習することは、それ以外の場合にどの機能が異なる分布を示すかについて明示的に扱われていないことに注意してください。したがって、個人のre-IDのビデオを一致させるには、時系列観測のダイナミクスをキャプチャし、制限付きのラベル付きトレーニングサンプルにアクセスしてビュー不変表現を学習するための柔軟なモデルが必要です。この論文では、差別的でビューに不変な比較可能な表現を学習するために、ビデオベースの人物の再IDに対する新しい数ショットの深層学習アプローチを提案します。提案された方法は、変分リカレントニューラルネットワーク（VRNN）で開発され、敵対的に訓練されて、人のマッチングにおいて非常に識別的であるがビュー不変である時間依存性を持つ潜在変数を生成します。 3つのベンチマークデータセットで行われた大規模な実験により、ビュー不変の時間的特徴とメソッドによって達成された最先端のパフォーマンスを作成する方法の能力を経験的に示しています。

Video-based person re-identification (re-ID) refers to matching people across camera views from arbitrary unaligned video footages. Existing methods rely on supervision signals to optimise a projected space under which the distances between inter/intra-videos are maximised/minimised. However, this demands exhaustively labelling people across camera views, rendering them unable to be scaled in large networked cameras. Also, it is noticed that learning effective video representations with view invariance is not explicitly addressed for which features exhibit different distributions otherwise. Thus, matching videos for person re-ID demands flexible models to capture the dynamics in time-series observations and learn view-invariant representations with access to limited labeled training samples. In this paper, we propose a novel few-shot deep learning approach to video-based person re-ID, to learn comparable representations that are discriminative and view-invariant. The proposed method is developed on the variational recurrent neural networks (VRNNs) and trained adversarially to produce latent variables with temporal dependencies that are highly discriminative yet view-invariant in matching persons. Through extensive experiments conducted on three benchmark datasets, we empirically show the capability of our method in creating view-invariant temporal features and state-of-the-art performance achieved by our method.

updated: Thu Sep 12 2019 05:23:08 GMT+0000 (UTC)

published: Fri Mar 29 2019 08:45:59 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト