Cross-modal Local Shortest Path and Global Enhancement for Visible-Thermal Person Re-Identification

Xiaohong Wang; Chaoqi Li; Xiangcai Ma

可視-熱的人の再識別のためのクロスモーダルローカル最短経路とグローバル強化

人間の姿勢と咬合によって引き起こされる認識の難しさを考慮することに加えて、可視-熱クロスモーダル人物再識別（VT-ReID）タスクで異なるイメージングシステムによって引き起こされるモーダルの違いを解決することも必要です。この論文では、ローカル機能とグローバル機能の共同学習に基づく2ストリームネットワークであるクロスモーダルローカル最短経路およびグローバル拡張（CM-LSP-GE）モジュールを提案します。私たちの論文の核となるアイデアは、局所的な特徴の位置合わせを使用してオクルージョンの問題を解決し、グローバルな特徴を強化することによってモードの違いを解決することです。まず、アテンションベースの2ストリームResNetネットワークは、デュアルモダリティ機能を抽出し、統合された機能空間にマッピングするように設計されています。次に、クロスモーダルな人物のポーズとオクルージョンの問題を解決するために、画像を水平方向にいくつかの等しい部分にカットして局所的な特徴を取得し、2つのグラフ間の局所的な特徴の最短経路を使用して、きめ細かい局所的な特徴の位置合わせを実現します。第三に、バッチ正規化拡張モジュールは、グローバル機能を適用して戦略を拡張し、異なるクラス間の差異の拡張をもたらします。マルチ粒度損失融合戦略は、アルゴリズムのパフォーマンスをさらに向上させます。最後に、ローカル機能とグローバル機能の共同学習メカニズムを使用して、クロスモーダルな人物の再識別の精度を向上させます。 2つの典型的なデータセットの実験結果は、私たちのモデルが最も最先端の方法よりも明らかに優れていることを示しています。特に、SYSU-MM01データセットでは、モデルはランク1とmAPのすべての検索用語で2.89％と7.96％のゲインを達成できます。ソースコードはまもなくリリースされます。

In addition to considering the recognition difficulty caused by human posture and occlusion, it is also necessary to solve the modal differences caused by different imaging systems in the Visible-Thermal cross-modal person re-identification (VT-ReID) task. In this paper,we propose the Cross-modal Local Shortest Path and Global Enhancement (CM-LSP-GE) modules,a two-stream network based on joint learning of local and global features. The core idea of our paper is to use local feature alignment to solve occlusion problem, and to solve modal difference by strengthening global feature. Firstly, Attention-based two-stream ResNet network is designed to extract dual-modality features and map to a unified feature space. Then, to solve the cross-modal person pose and occlusion problems, the image are cut horizontally into several equal parts to obtain local features and the shortest path in local features between two graphs is used to achieve the fine-grained local feature alignment. Thirdly, a batch normalization enhancement module applies global features to enhance strategy, resulting in difference enhancement between different classes. The multi granularity loss fusion strategy further improves the performance of the algorithm. Finally, joint learning mechanism of local and global features is used to improve cross-modal person re-identification accuracy. The experimental results on two typical datasets show that our model is obviously superior to the most state-of-the-art methods. Especially, on SYSU-MM01 datasets, our model can achieve a gain of 2.89%and 7.96% in all search term of Rank-1 and mAP. The source code will be released soon.

updated: Thu Jun 09 2022 10:27:22 GMT+0000 (UTC)

published: Thu Jun 09 2022 10:27:22 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト