Towards Homogeneous Modality Learning and Multi-Granularity Information Exploration for Visible-Infrared Person Re-Identification

Haojie Liu; Daoxun Xia; Wei Jiang; Chao Xu

可視赤外線人物の再識別のための均質モダリティ学習とマルチグラニュラリティ情報探索に向けて

可視赤外線人物の再識別（VI-ReID）は、可視および赤外線カメラビューで一連の人物画像を取得することを目的とした、やりがいのある重要なタスクです。異種画像に存在する大きなモダリティの不一致の影響を軽減するために、以前の方法では、生成的敵対的ネットワーク（GAN）を適用して、モダリティ整合性のあるデータを生成しようとします。ただし、可視領域と赤外線領域の間の色の変化が激しいため、生成された偽のクロスモダリティサンプルは、合成されたシナリオとターゲットの実際のシナリオとの間のモダリティギャップを埋めるのに十分な品質を備えていないことが多く、最適ではない特徴表現につながります。この作業では、可視赤外線デュアルモード学習をグレーグレーシングルモード学習問題として再定式化する統一された暗線スペクトルであるAligned Grayscale Modality（AGM）を使用してクロスモダリティマッチング問題に対処します。具体的には、均質な可視画像からgrasycaleモダリティを生成します。次に、赤外線画像を均一なグレースケール画像に転送するためのスタイル転送モデルをトレーニングします。このようにして、モダリティの不一致は画像空間で大幅に減少します。残りの外観の不一致を減らすために、特徴レベルの位置合わせを行うためのマルチグラニュラリティ特徴抽出ネットワークをさらに導入します。グローバル情報に依存するのではなく、ローカル（ヘッドショルダー）機能を活用して、より強力な機能記述子を形成するために相互に補完する人物Re-IDを支援することを提案します。主流の評価データセットに実装された包括的な実験には、SYSU-MM01とRegDBが含まれ、私たちの方法が最先端の方法に対してクロスモダリティ検索のパフォーマンスを大幅に向上させることができることを示しています。

Visible-infrared person re-identification (VI-ReID) is a challenging and essential task, which aims to retrieve a set of person images over visible and infrared camera views. In order to mitigate the impact of large modality discrepancy existing in heterogeneous images, previous methods attempt to apply generative adversarial network (GAN) to generate the modality-consisitent data. However, due to severe color variations between the visible domain and infrared domain, the generated fake cross-modality samples often fail to possess good qualities to fill the modality gap between synthesized scenarios and target real ones, which leads to sub-optimal feature representations. In this work, we address cross-modality matching problem with Aligned Grayscale Modality (AGM), an unified dark-line spectrum that reformulates visible-infrared dual-mode learning as a gray-gray single-mode learning problem. Specifically, we generate the grasycale modality from the homogeneous visible images. Then, we train a style tranfer model to transfer infrared images into homogeneous grayscale images. In this way, the modality discrepancy is significantly reduced in the image space. In order to reduce the remaining appearance discrepancy, we further introduce a multi-granularity feature extraction network to conduct feature-level alignment. Rather than relying on the global information, we propose to exploit local (head-shoulder) features to assist person Re-ID, which complements each other to form a stronger feature descriptor. Comprehensive experiments implemented on the mainstream evaluation datasets include SYSU-MM01 and RegDB indicate that our method can significantly boost cross-modality retrieval performance against the state of the art methods.

updated: Mon Apr 11 2022 03:03:19 GMT+0000 (UTC)

published: Mon Apr 11 2022 03:03:19 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト