Pose-guided Feature Disentangling for Occluded Person Re-identification Based on Transformer

Tao Wang; Hong Liu; Pinhao Song; Tianyu Guo; Wei Shi

トランスフォーマーに基づく閉塞者の再識別のためのポーズ誘導機能の解きほぐし

特定のシーンでは、人体の一部が障害物（樹木、車、歩行者など）によって遮られる可能性があるため、遮られた人物の再識別は困難な作業です。いくつかの既存のポーズガイド法は、グラフマッチングに従って体の部分を位置合わせすることによってこの問題を解決しますが、これらのグラフベースの方法は直感的で複雑ではありません。したがって、ポーズ情報を利用してセマンティックコンポーネント（人体や関節の部分など）を明確に解きほぐし、それに応じて非閉塞部分を選択的に一致させることにより、トランスベースのポーズガイド機能解きほぐし（PFD）方法を提案します。まず、Vision Transformer（ViT）を使用して、強力な機能を備えたパッチ機能を抽出します。第二に、パッチ情報からポーズ情報を事前に解きほぐすために、マッチングと分散のメカニズムがポーズガイド機能集約（PFA）モジュールで活用されます。第三に、学習可能なセマンティックビューのセットがトランスフォーマーデコーダーに導入され、解きほぐされた身体部分の機能を暗黙的に強化します。ただし、これらのセマンティックビューは、追加の監視なしにボディに関連することが保証されていません。したがって、ポーズビューマッチング（PVM）モジュールは、目に見える身体部分を明示的にマッチングし、オクルージョン機能を自動的に分離するために提案されています。第4に、オクルージョンの干渉をより適切に防ぐために、目に見える身体部分の特徴を強調するポーズガイドプッシュロスを設計します。 2つのタスク（閉塞および全体的なRe-ID）に対する5つの挑戦的なデータセットに関する広範な実験は、提案されたPFDが優れた有望であり、最先端の方法に対して有利に機能することを示しています。コードはhttps://github.com/WangTaoAs/PFD_Netで入手できます

Occluded person re-identification is a challenging task as human body parts could be occluded by some obstacles (e.g. trees, cars, and pedestrians) in certain scenes. Some existing pose-guided methods solve this problem by aligning body parts according to graph matching, but these graph-based methods are not intuitive and complicated. Therefore, we propose a transformer-based Pose-guided Feature Disentangling (PFD) method by utilizing pose information to clearly disentangle semantic components (e.g. human body or joint parts) and selectively match non-occluded parts correspondingly. First, Vision Transformer (ViT) is used to extract the patch features with its strong capability. Second, to preliminarily disentangle the pose information from patch information, the matching and distributing mechanism is leveraged in Pose-guided Feature Aggregation (PFA) module. Third, a set of learnable semantic views are introduced in transformer decoder to implicitly enhance the disentangled body part features. However, those semantic views are not guaranteed to be related to the body without additional supervision. Therefore, Pose-View Matching (PVM) module is proposed to explicitly match visible body parts and automatically separate occlusion features. Fourth, to better prevent the interference of occlusions, we design a Pose-guided Push Loss to emphasize the features of visible body parts. Extensive experiments over five challenging datasets for two tasks (occluded and holistic Re-ID) demonstrate that our proposed PFD is superior promising, which performs favorably against state-of-the-art methods. Code is available at https://github.com/WangTaoAs/PFD_Net

updated: Sun Dec 05 2021 03:23:31 GMT+0000 (UTC)

published: Sun Dec 05 2021 03:23:31 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト