Learning to Evaluate Performance of Multi-modal Semantic Localization

Zhiqiang Yuan; Wenkai Zhang; Chongyang Li; Zhaoying Pan; Yongqiang Mao; Jialiang Chen; Shouke Li; Hongqi Wang; Xian Sun

マルチモーダルセマンティックローカリゼーションのパフォーマンスを評価する学習

セマンティックローカリゼーション (SeLo) は、テキストなどのセマンティック情報を使用して、大規模なリモートセンシング (RS) 画像で最も関連性の高い場所を取得するタスクを指します。クロスモーダル検索に基づく新たなタスクとして、SeLo はキャプションレベルの注釈のみを使用してセマンティックレベルの検索を実現します。これは、下流のタスクを統合する大きな可能性を示しています。 SeLo は相次いで実施されていますが、現在のところ、この緊急の方向性を体系的に調査および分析する作業はありません。このホワイトペーパーでは、この分野を徹底的に研究し、SeLo タスクを進めるためのメトリックとテストデータに関する完全なベンチマークを提供します。まず、このタスクの特性に基づいて、SeLo タスクのパフォーマンスを定量化するための複数の識別評価指標を提案します。生成された SeLo マップをピクセルレベルおよび領域レベルから評価するために、考案された有効面積比率、注意移動距離、および離散注意距離が利用されます。次に、SeLo タスクの標準評価データを提供するために、多様で多セマンティック、多目的のセマンティックローカリゼーションテストセット (AIR-SLT) を提供します。 AIR-SLT は、22 の大規模な RS 画像と、セマンティクスが異なる 59 のテストケースで構成されており、検索モデルの包括的な評価を提供することを目的としています。最後に、RS クロスモーダル検索モデルの SeLo パフォーマンスを詳細に分析し、このタスクに対するさまざまな変数の影響を調べ、SeLo タスクの完全なベンチマークを提供します。また、RS 参照表現理解の新しいパラダイムを確立し、SeLo を検出や道路抽出などのタスクと組み合わせることで、セマンティクスにおける SeLo の大きな利点を実証しました。提案された評価指標、セマンティックローカリゼーションテストセット、および対応するスクリプトは、github.com/xiaoyuan1996/SemanticLocalizationMetrics で公開されています。

Semantic localization (SeLo) refers to the task of obtaining the most relevant locations in large-scale remote sensing (RS) images using semantic information such as text. As an emerging task based on cross-modal retrieval, SeLo achieves semantic-level retrieval with only caption-level annotation, which demonstrates its great potential in unifying downstream tasks. Although SeLo has been carried out successively, but there is currently no work has systematically explores and analyzes this urgent direction. In this paper, we thoroughly study this field and provide a complete benchmark in terms of metrics and testdata to advance the SeLo task. Firstly, based on the characteristics of this task, we propose multiple discriminative evaluation metrics to quantify the performance of the SeLo task. The devised significant area proportion, attention shift distance, and discrete attention distance are utilized to evaluate the generated SeLo map from pixel-level and region-level. Next, to provide standard evaluation data for the SeLo task, we contribute a diverse, multi-semantic, multi-objective Semantic Localization Testset (AIR-SLT). AIR-SLT consists of 22 large-scale RS images and 59 test cases with different semantics, which aims to provide a comprehensive evaluations for retrieval models. Finally, we analyze the SeLo performance of RS cross-modal retrieval models in detail, explore the impact of different variables on this task, and provide a complete benchmark for the SeLo task. We have also established a new paradigm for RS referring expression comprehension, and demonstrated the great advantage of SeLo in semantics through combining it with tasks such as detection and road extraction. The proposed evaluation metrics, semantic localization testsets, and corresponding scripts have been open to access at github.com/xiaoyuan1996/SemanticLocalizationMetrics .

updated: Thu Sep 15 2022 01:40:34 GMT+0000 (UTC)

published: Wed Sep 14 2022 09:39:03 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト