Person Re-Identification with a Locally Aware Transformer

Charu Sharma; Siddhant R. Kapil; David Chapman

ローカルに対応したトランスフォーマーによる個人の再識別

人物の再識別は、コンピュータビジョンベースの監視アプリケーションにおいて重要な問題です。このアプリケーションでは、近くのさまざまなゾーンでの監視写真から同じ人物を識別しようとします。現在、Person re-ID 技術の大部分は畳み込みニューラルネットワーク (CNN) に基づいていますが、ビジョントランスフォーマーは、さまざまなオブジェクト認識タスクのために純粋な CNN に取って代わり始めています。ビジョントランスフォーマーの主な出力はグローバル分類トークンですが、ビジョントランスフォーマーは、画像のローカル領域に関する追加情報を含むローカルトークンも生成します。これらのローカルトークンを使用して分類の精度を向上させる手法は、活発な研究分野です。グローバルに強化されたローカル分類トークンを N 個の分類器のアンサンブルに集約するためのパーツベースの畳み込みベースライン (PCB) に触発された戦略を採用する、新しい Locally Aware Transformer (LA-Transformer) を提案します。ここで、N はパッチの数です。追加の目新しさは、再 ID の精度をさらに向上させるブロック単位の微調整を組み込んでいることです。ブロックごとの微調整を備えた LA-Transformer は、Market-1501 で 98.27 % の標準偏差で 98.27 % のランク 1 精度を達成し、CUHK03 データセットで 98.7 % の標準偏差でそれぞれ 98.7 % の標準偏差を達成し、他のすべての最先端技術を凌駕します。執筆時点で公開されている方法。

Person Re-Identification is an important problem in computer vision-based surveillance applications, in which the same person is attempted to be identified from surveillance photographs in a variety of nearby zones. At present, the majority of Person re-ID techniques are based on Convolutional Neural Networks (CNNs), but Vision Transformers are beginning to displace pure CNNs for a variety of object recognition tasks. The primary output of a vision transformer is a global classification token, but vision transformers also yield local tokens which contain additional information about local regions of the image. Techniques to make use of these local tokens to improve classification accuracy are an active area of research. We propose a novel Locally Aware Transformer (LA-Transformer) that employs a Parts-based Convolution Baseline (PCB)-inspired strategy for aggregating globally enhanced local classification tokens into an ensemble of N classifiers, where N is the number of patches. An additional novelty is that we incorporate blockwise fine-tuning which further improves re-ID accuracy. LA-Transformer with blockwise fine-tuning achieves rank-1 accuracy of 98.27 % with standard deviation of 0.13 on the Market-1501 and 98.7% with standard deviation of 0.2 on the CUHK03 dataset respectively, outperforming all other state-of-the-art published methods at the time of writing.

updated: Mon Jun 07 2021 15:31:19 GMT+0000 (UTC)

published: Mon Jun 07 2021 15:31:19 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト