Learning Disentangled Representation Implicitly via Transformer for Occluded Person Re-Identification

Mengxi Jia; Xinhua Cheng; Shijian Lu; Jian Zhang

閉塞した人の再識別のためにトランスフォーマーを介して暗黙的に解きほぐされた表現を学習する

さまざまなタイプのオクルージョンを持つ人物画像は、画像のマッチングとランク付けの不整合に悩まされることが多いため、さまざまなオクルージョンの下での人物の再識別（re-ID）は長年の課題でした。ほとんどの既存の方法は、外部の意味的手がかりまたは特徴の類似性に従って身体部分の空間的特徴を整列させることによってこの課題に取り組みますが、この整列アプローチは複雑であり、ノイズに敏感です。 DRL-Netは、厳密な人物画像の位置合わせや追加の監視を必要とせずに、遮蔽されたre-IDを処理する解きほぐされた表現学習ネットワークを設計します。 DRL-Netは、トランスアーキテクチャを活用して、遮蔽された人物の画像のローカルな特徴をグローバルに推論することにより、位置合わせのないre-IDを実現します。トランスフォーマーのセマンティックプリファレンスオブジェクトクエリのガイダンスの下で、人体の一部や障害物などの未定義のセマンティックコンポーネントの表現を自動的に解きほぐすことにより、画像の類似性を測定します。さらに、トランスフォーマーデコーダーで非相関制約を設計し、それをオブジェクトクエリに課して、さまざまなセマンティックコンポーネントにより焦点を当てます。オクルージョンからの干渉をより適切に排除するために、オクルージョン機能と識別ID機能をより適切に分離するためのコントラスト機能学習手法（CFL）を設計します。オクルードおよびホリスティックre-IDベンチマーク（Occluded-DukeMTMC、Market1501、およびDukeMTMC）に関する広範な実験は、DRL-Netが一貫して優れたre-IDパフォーマンスを達成し、Occluded-DukeMTMCの大幅なマージンで最先端を上回っていることを示しています。

Person re-identification (re-ID) under various occlusions has been a long-standing challenge as person images with different types of occlusions often suffer from misalignment in image matching and ranking. Most existing methods tackle this challenge by aligning spatial features of body parts according to external semantic cues or feature similarities but this alignment approach is complicated and sensitive to noises. We design DRL-Net, a disentangled representation learning network that handles occluded re-ID without requiring strict person image alignment or any additional supervision. Leveraging transformer architectures, DRL-Net achieves alignment-free re-ID via global reasoning of local features of occluded person images. It measures image similarity by automatically disentangling the representation of undefined semantic components, e.g., human body parts or obstacles, under the guidance of semantic preference object queries in the transformer. In addition, we design a decorrelation constraint in the transformer decoder and impose it over object queries for better focus on different semantic components. To better eliminate interference from occlusions, we design a contrast feature learning technique (CFL) for better separation of occlusion features and discriminative ID features. Extensive experiments over occluded and holistic re-ID benchmarks (Occluded-DukeMTMC, Market1501 and DukeMTMC) show that the DRL-Net achieves superior re-ID performance consistently and outperforms the state-of-the-art by large margins for Occluded-DukeMTMC.

updated: Tue Jul 06 2021 04:24:10 GMT+0000 (UTC)

published: Tue Jul 06 2021 04:24:10 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト