Graph Sampling Based Deep Metric Learning for Generalizable Person Re-Identification

Shengcai Liao; Ling Shao

一般化可能な個人の再識別のためのグラフサンプリングベースのディープメトリック学習

人の再識別のために、既存の深いネットワークはしばしば表現学習に焦点を合わせています。ただし、転移学習がないと、学習したモデルはそのまま固定されるため、さまざまな目に見えないシナリオの処理には適応できません。この論文では、表現学習を超えて、深い特徴マップで直接人物画像マッチングを定式化する方法を検討します。画像マッチングを特徴マップ内のローカル対応を見つけるものとして扱い、クエリ適応型畳み込みカーネルをその場で構築してローカルマッチングを実現します。このように、マッチングプロセスと結果は解釈可能であり、この明示的なマッチングは、未知のミスアライメント、ポーズ、視点の変更など、目に見えないシナリオへの表現機能よりも一般化できます。このアーキテクチャのエンドツーエンドのトレーニングを容易にするために、クラスメモリモジュールをさらに構築して、各クラスの最新サンプルの特徴マップをキャッシュし、メトリック学習の画像マッチング損失を計算します。直接のクロスデータセット評価を通じて、提案されたクエリ適応畳み込み（QAConv）メソッドは、一般的な学習メソッド（約10％+ mAP）に比べて大幅に改善され、多くの転移学習メソッドと同等の結果を達成します。さらに、TLiftと呼ばれるモデルフリーの時間的共起ベースのスコア重み付け方法が提案されています。これにより、パフォーマンスがさらに向上し、データセット間の個人の再識別で最先端の結果が得られます。コードはhttps://github.com/ShengcaiLiao/QAConvで入手できます。

For person re-identification, existing deep networks often focus on representation learning. However, without transfer learning, the learned model is fixed as is, which is not adaptable for handling various unseen scenarios. In this paper, beyond representation learning, we consider how to formulate person image matching directly in deep feature maps. We treat image matching as finding local correspondences in feature maps, and construct query-adaptive convolution kernels on the fly to achieve local matching. In this way, the matching process and results are interpretable, and this explicit matching is more generalizable than representation features to unseen scenarios, such as unknown misalignments, pose or viewpoint changes. To facilitate end-to-end training of this architecture, we further build a class memory module to cache feature maps of the most recent samples of each class, so as to compute image matching losses for metric learning. Through direct cross-dataset evaluation, the proposed Query-Adaptive Convolution (QAConv) method gains large improvements over popular learning methods (about 10%+ mAP), and achieves comparable results to many transfer learning methods. Besides, a model-free temporal cooccurrence based score weighting method called TLift is proposed, which improves the performance to a further extent, achieving state-of-the-art results in cross-dataset person re-identification. Code is available at https://github.com/ShengcaiLiao/QAConv.

updated: Sun Apr 04 2021 06:44:15 GMT+0000 (UTC)

published: Sun Apr 04 2021 06:44:15 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト