Self-Supervised Pre-Training for Transformer-Based Person Re-Identification

Hao Luo; Pichao Wang; Yi Xu; Feng Ding; Yanxin Zhou; Fan Wang; Hao Li; Rong Jin

変圧器ベースの個人再識別のための自己監視事前トレーニング

Transformerベースの教師あり事前トレーニングは、個人の再識別（ReID）で優れたパフォーマンスを実現します。ただし、ImageNetデータセットとReIDデータセットの間のドメインギャップのため、トランスフォーマーの強力なデータフィッティング機能により、パフォーマンスを向上させるには、通常、より大きな事前トレーニングデータセット（ImageNet-21Kなど）が必要です。この課題に対処するために、この作業は、データとモデル構造の観点から、それぞれ事前トレーニングデータセットとReIDデータセットの間のギャップを緩和することを目的としています。最初に、ラベルのない人物画像（LUPersonデータセット）で事前トレーニングされたVision Transformer（ViT）を使用した自己教師あり学習（SSL）メソッドを調査し、ReIDタスクでImageNetの教師あり事前トレーニングモデルを大幅に上回っていることを経験的に発見しました。ドメインギャップをさらに減らし、事前トレーニングを加速するために、Catastrophic Forgetting Score（CFS）を提案して、事前トレーニングと微調整データの間のギャップを評価します。 CFSに基づいて、ダウンストリームReIDデータに近い関連データをサンプリングし、トレーニング前のデータセットから関連性のないデータをフィルタリングすることにより、サブセットが選択されます。モデル構造については、IBNベースの畳み込みステム（ICS）という名前のReID固有のモジュールを提案して、より不変な機能を学習することでドメインギャップを埋めます。教師あり学習、教師なしドメイン適応（UDA）、および教師なし学習（USL）の設定の下で、事前トレーニングモデルを微調整するために、広範な実験が実施されました。 LUPersonデータセットをパフォーマンスを低下させることなく50％にダウンスケールすることに成功しました。最後に、Market-1501とMSMT17で最先端のパフォーマンスを実現します。たとえば、ViT-S / 16は、監視あり/ UDA / USL ReIDのMarket1501で91.3％/ 89.9％/ 89.6％のmAP精度を達成します。コードとモデルはhttps://github.com/michuanhaohao/TransReID-SSLにリリースされます。

Transformer-based supervised pre-training achieves great performance in person re-identification (ReID). However, due to the domain gap between ImageNet and ReID datasets, it usually needs a larger pre-training dataset (e.g. ImageNet-21K) to boost the performance because of the strong data fitting ability of the transformer. To address this challenge, this work targets to mitigate the gap between the pre-training and ReID datasets from the perspective of data and model structure, respectively. We first investigate self-supervised learning (SSL) methods with Vision Transformer (ViT) pretrained on unlabelled person images (the LUPerson dataset), and empirically find it significantly surpasses ImageNet supervised pre-training models on ReID tasks. To further reduce the domain gap and accelerate the pre-training, the Catastrophic Forgetting Score (CFS) is proposed to evaluate the gap between pre-training and fine-tuning data. Based on CFS, a subset is selected via sampling relevant data close to the down-stream ReID data and filtering irrelevant data from the pre-training dataset. For the model structure, a ReID-specific module named IBN-based convolution stem (ICS) is proposed to bridge the domain gap by learning more invariant features. Extensive experiments have been conducted to fine-tune the pre-training models under supervised learning, unsupervised domain adaptation (UDA), and unsupervised learning (USL) settings. We successfully downscale the LUPerson dataset to 50% with no performance degradation. Finally, we achieve state-of-the-art performance on Market-1501 and MSMT17. For example, our ViT-S/16 achieves 91.3%/89.9%/89.6% mAP accuracy on Market1501 for supervised/UDA/USL ReID. Codes and models will be released to https://github.com/michuanhaohao/TransReID-SSL.

updated: Tue Nov 23 2021 18:59:08 GMT+0000 (UTC)

published: Tue Nov 23 2021 18:59:08 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト