TransReID: Transformer-based Object Re-Identification

Shuting He; Hao Luo; Pichao Wang; Fan Wang; Hao Li; Wei Jiang

TransReID：Transformerベースのオブジェクトの再識別

このホワイトペーパーでは、オブジェクトの再識別（ReID）タスクのために、純粋なトランスベースのモデルであるVision Transformer（ViT）について説明します。いくつかの適応により、強力なベースラインViT-BoTがバックボーンとしてViTを使用して構築され、いくつかのReIDベンチマークで畳み込みニューラルネットワーク（CNN-）ベースのフレームワークと同等の結果が得られます。さらに、ReIDデータの特殊性を考慮して2つのモジュールが設計されています。（1）Transformerがカメラや視点などの非視覚情報をベクトル埋め込み表現にエンコードすることは、非常に自然で簡単です。これらの埋め込みにプラグインすることで、ViTは、多様なカメラや視点によって引き起こされるバイアスを排除する機能を備えています。（2）2ブランチ学習フレームワークでのモデルのトレーニングを容易にするために、グローバルブランチと並行してジグソーブランチを設計します。ジグソーブランチでは、ジグソーパッチモジュールは、堅牢な特徴表現を学習し、パッチをシャッフルすることでトランスフォーマーのトレーニングを支援するように設計されています。これらの新しいモジュールを使用して、TransReIDと呼ばれる純粋なトランスフォーマーフレームワークを提案します。これは、私たちの知る限り、ReID研究に純粋なトランスフォーマーを使用する最初の作業です。 TransReIDの実験結果は非常に有望であり、人と車両の両方のReIDベンチマークで最先端のパフォーマンスを実現します。

In this paper, we explore the Vision Transformer (ViT), a pure transformer-based model, for the object re-identification (ReID) task. With several adaptations, a strong baseline ViT-BoT is constructed with ViT as backbone, which achieves comparable results to convolution neural networks- (CNN-) based frameworks on several ReID benchmarks. Furthermore, two modules are designed in consideration of the specialties of ReID data: (1) It is super natural and simple for Transformer to encode non-visual information such as camera or viewpoint into vector embedding representations. Plugging into these embeddings, ViT holds the ability to eliminate the bias caused by diverse cameras or viewpoints.(2) We design a Jigsaw branch, parallel with the Global branch, to facilitate the training of the model in a two-branch learning framework. In the Jigsaw branch, a jigsaw patch module is designed to learn robust feature representation and help the training of transformer by shuffling the patches. With these novel modules, we propose a pure-transformer framework dubbed as TransReID, which is the first work to use a pure Transformer for ReID research to the best of our knowledge. Experimental results of TransReID are superior promising, which achieve state-of-the-art performance on both person and vehicle ReID benchmarks.

updated: Mon Feb 08 2021 17:33:59 GMT+0000 (UTC)

published: Mon Feb 08 2021 17:33:59 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト