How Image Generation Helps Visible-to-Infrared Person Re-Identification?

Honghu Pan; Yongyong Chen; Yunqi He; Xin Li; Zhenyu He

画像生成が可視から赤外への人物の再識別にどのように役立つか?

可視から可視 (V2V) の人物再識別 (ReID) と比較して、可視から赤外線 (V2I) の人物再識別タスクは、十分なトレーニングサンプルがなく、モダリティ間の相違が大きいため、より困難です。この目的のために、V2I パーソン ReID のトレーニングサンプル拡張とクロスモダリティ画像生成を共同で実現できる統合フレームワークである Flow2Flow を提案します。具体的には、Flow2Flow は、可視画像ドメインと赤外線ドメインの両方から、可逆可視フローベースのジェネレーターと赤外線のジェネレーターをそれぞれ使用して、共有等方性ガウスドメインへの全単射変換を学習します。 Flow2Flow を使用すると、潜在的なガウスノイズから可視画像または赤外線画像への変換によって疑似トレーニングサンプルを生成し、既存のモダリティ画像から潜在的なガウスノイズ、欠落したモダリティ画像への変換によってクロスモダリティ画像を生成できます。生成された画像のアイデンティティアライメントとモダリティアライメントを目的として、Flow2Flow をトレーニングするための敵対的トレーニング戦略を開発します。具体的には、各モダリティの画像エンコーダとモダリティ弁別器を設計します。画像エンコーダーは、生成された画像が同一のアイデンティティの実際の画像に似ているように、同一の敵対的トレーニングを介して奨励し、モダリティ弁別器は、モダリティの敵対的トレーニングを介して、生成された画像を実際の画像とモーダルで区別できないようにします。 SYSU-MM01 と RegDB の実験結果は、トレーニングサンプルの拡張とクロスモダリティ画像生成の両方が V2I ReID の精度を大幅に改善できることを示しています。

Compared to visible-to-visible (V2V) person re-identification (ReID), the visible-to-infrared (V2I) person ReID task is more challenging due to the lack of sufficient training samples and the large cross-modality discrepancy. To this end, we propose Flow2Flow, a unified framework that could jointly achieve training sample expansion and cross-modality image generation for V2I person ReID. Specifically, Flow2Flow learns bijective transformations from both the visible image domain and the infrared domain to a shared isotropic Gaussian domain with an invertible visible flow-based generator and an infrared one, respectively. With Flow2Flow, we are able to generate pseudo training samples by the transformation from latent Gaussian noises to visible or infrared images, and generate cross-modality images by transformations from existing-modality images to latent Gaussian noises to missing-modality images. For the purpose of identity alignment and modality alignment of generated images, we develop adversarial training strategies to train Flow2Flow. Specifically, we design an image encoder and a modality discriminator for each modality. The image encoder encourages the generated images to be similar to real images of the same identity via identity adversarial training, and the modality discriminator makes the generated images modal-indistinguishable from real images via modality adversarial training. Experimental results on SYSU-MM01 and RegDB demonstrate that both training sample expansion and cross-modality image generation can significantly improve V2I ReID accuracy.

updated: Tue Oct 04 2022 13:09:29 GMT+0000 (UTC)

published: Tue Oct 04 2022 13:09:29 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト