IIITD-20K: Dense captioning for Text-Image ReID

A V Subramanyam; Niranjan Sundararajan; Vibhu Dubey; Brejesh Lall

IIITD-20K: Text-Image ReID の高密度キャプション

Text-to-Image (T2I) ReID は、最近多くの注目を集めています。 CUHK-PEDES、RSTPReid、および ICFG-PEDES は、T2I ReID メソッドを評価するために使用できる 3 つのベンチマークです。 RSTPReid と ICFG-PEDES は MSMT17 の ID で構成されていますが、固有の人物の数が限られているため、多様性は限られています。一方、CUHK-PEDES は 13,003 個の ID で構成されていますが、平均して比較的短いテキスト記述になっています。さらに、これらのデータセットは、限られた数のカメラで制限された環境でキャプチャされます。アイデンティティをさらに多様化し、高密度のキャプションを提供するために、IIITD-20K と呼ばれる新しいデータセットを提案します。 IIITD-20K は、野生でキャプチャされた 20,000 の一意の ID で構成され、テキストから画像への ReID のための豊富なデータセットを提供します。説明は最低 26 語で、各画像にはしっかりとキャプションが付けられています。さらに、データセットでトレーニングされた安定拡散モデルと BLIP モデルを使用して、画像ときめの細かいキャプションを合成的に生成します。最先端のテキストから画像への ReID モデルと視覚言語の事前トレーニング済みモデルを使用して精巧な実験を行い、データセットの包括的な分析を提示します。私たちの実験では、合成的に生成されたデータが、同じデータセットと複数のデータセットの設定の両方で大幅なパフォーマンスの向上につながることも明らかになりました。私たちのデータセットは https://bit.ly/3pkA3Rj で入手できます。

Text-to-Image (T2I) ReID has attracted a lot of attention in the recent past. CUHK-PEDES, RSTPReid and ICFG-PEDES are the three available benchmarks to evaluate T2I ReID methods. RSTPReid and ICFG-PEDES comprise of identities from MSMT17 but due to limited number of unique persons, the diversity is limited. On the other hand, CUHK-PEDES comprises of 13,003 identities but has relatively shorter text description on average. Further, these datasets are captured in a restricted environment with limited number of cameras. In order to further diversify the identities and provide dense captions, we propose a novel dataset called IIITD-20K. IIITD-20K comprises of 20,000 unique identities captured in the wild and provides a rich dataset for text-to-image ReID. With a minimum of 26 words for a description, each image is densely captioned. We further synthetically generate images and fine-grained captions using Stable-diffusion and BLIP models trained on our dataset. We perform elaborate experiments using state-of-art text-to-image ReID models and vision-language pre-trained models and present a comprehensive analysis of the dataset. Our experiments also reveal that synthetically generated data leads to a substantial performance improvement in both same dataset as well as cross dataset settings. Our dataset is available at https://bit.ly/3pkA3Rj.

updated: Mon May 08 2023 06:46:56 GMT+0000 (UTC)

published: Mon May 08 2023 06:46:56 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト