Cross-Modal Collaborative Representation Learning and a Large-Scale RGBT Benchmark for Crowd Counting

Lingbo Liu; Jiaqi Chen; Hefeng Wu; Guanbin Li; Chenglong Li; Liang Lin

群集カウントのためのクロスモーダル協調表現学習と大規模RGBTベンチマーク

群集カウントは基本的でありながら挑戦的なタスクであり、ピクセル単位の群集密度マップを生成するための豊富な情報が必要です。ただし、以前のほとんどの方法では、RGB画像の限られた情報しか使用されておらず、制約のないシナリオで潜在的な歩行者を十分に発見することはできません。この作業では、光学的および熱的情報を組み込むことが歩行者の認識に大いに役立つことがわかりました。この分野での将来の研究を促進するために、大規模なRGBT群集カウント（RGBT-CC）ベンチマークを導入します。これには、138,389人の注釈が付けられた2,030ペアのRGB熱画像が含まれています。さらに、マルチモーダル群集カウントを容易にするために、複数のモダリティ固有のブランチ、モダリティ共有ブランチ、および補足情報をキャプチャするための情報集約-配布モジュール（IADM）で構成されるクロスモーダル協調表現学習フレームワークを提案します。完全に異なるモダリティの。具体的には、私たちのIADMは、2つの協調的な情報転送を組み込んで、二重の情報伝播メカニズムを使用して、モダリティ共有およびモダリティ固有の表現を動的に強化します。 RGBT-CCベンチマークで実施された広範な実験は、RGBT群集カウントのためのフレームワークの有効性を示しています。さらに、提案されたアプローチはマルチモーダル群集カウントに普遍的であり、ShanghaiTechRGBDデータセットで優れたパフォーマンスを達成することもできます。最後に、ソースコードとベンチマークはhttp://lingboliu.com/RGBT_Crowd_Counting.htmlでリリースされています。

Crowd counting is a fundamental yet challenging task, which desires rich information to generate pixel-wise crowd density maps. However, most previous methods only used the limited information of RGB images and cannot well discover potential pedestrians in unconstrained scenarios. In this work, we find that incorporating optical and thermal information can greatly help to recognize pedestrians. To promote future researches in this field, we introduce a large-scale RGBT Crowd Counting (RGBT-CC) benchmark, which contains 2,030 pairs of RGB-thermal images with 138,389 annotated people. Furthermore, to facilitate the multimodal crowd counting, we propose a cross-modal collaborative representation learning framework, which consists of multiple modality-specific branches, a modality-shared branch, and an Information Aggregation-Distribution Module (IADM) to capture the complementary information of different modalities fully. Specifically, our IADM incorporates two collaborative information transfers to dynamically enhance the modality-shared and modality-specific representations with a dual information propagation mechanism. Extensive experiments conducted on the RGBT-CC benchmark demonstrate the effectiveness of our framework for RGBT crowd counting. Moreover, the proposed approach is universal for multimodal crowd counting and is also capable to achieve superior performance on the ShanghaiTechRGBD dataset. Finally, our source code and benchmark are released at http://lingboliu.com/RGBT_Crowd_Counting.html.

updated: Tue Apr 06 2021 03:02:31 GMT+0000 (UTC)

published: Tue Dec 08 2020 16:18:29 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト