MultiScene: A Large-scale Dataset and Benchmark for Multi-scene Recognition in Single Aerial Images

Yuansheng Hua; Lichao Mou; Pu Jin; Xiao Xiang Zhu

MultiScene：単一の航空画像でのマルチシーン認識のための大規模なデータセットとベンチマーク

空中シーンの認識は、高解像度の空中画像を解釈する際の基本的な研究課題です。過去数年間、ほとんどの研究は画像を1つのシーンカテゴリに分類することに焦点を当てていますが、実際のシナリオでは、1つの画像に複数のシーンが含まれることがよくあります。したがって、この論文では、より実用的でありながら未踏のタスク、つまり単一画像でのマルチシーン認識について調査します。この目的のために、100,000の制約のない高解像度の航空画像で構成されるMultiSceneと呼ばれる大規模なデータセットを作成します。このような画像に手動でラベルを付けるのは非常に難しいことを考慮して、OpenStreetMap（OSM）などのクラウドソーシングプラットフォームからの低コストの注釈に頼っています。ただし、OSMデータは不完全性や不正確性に悩まされる可能性があり、画像ラベルにノイズが発生します。この問題に対処するために、14,000枚の画像を視覚的に検査し、それらのシーンラベルを修正して、MultiScene-Cleanという名前のきれいに注釈が付けられた画像のサブセットを生成します。これにより、クリーンなデータを使用してマルチシーン認識のためのディープネットワークを開発および評価できます。さらに、ノイズの多いラベルを使用したネットワーク学習を研究する目的で、すべての画像のクラウドソーシングによる注釈を提供します。 MultiScene-CleanとMultiSceneの両方で広範なベースラインモデルを使用して実験を行い、単一画像でのマルチシーン認識のベンチマークと、このタスクのノイズの多いラベルからの学習をそれぞれ提供します。進捗を促進するために、データセットとトレーニング済みモデルをhttps://gitlab.lrz.de/ai4eo/reasoning/multisceneで利用できるようにします。

Aerial scene recognition is a fundamental research problem in interpreting high-resolution aerial imagery. Over the past few years, most studies focus on classifying an image into one scene category, while in real-world scenarios, it is more often that a single image contains multiple scenes. Therefore, in this paper, we investigate a more practical yet underexplored task -- multi-scene recognition in single images. To this end, we create a large-scale dataset, called MultiScene, composed of 100,000 unconstrained high-resolution aerial images. Considering that manually labeling such images is extremely arduous, we resort to low-cost annotations from crowdsourcing platforms, e.g., OpenStreetMap (OSM). However, OSM data might suffer from incompleteness and incorrectness, which introduce noise into image labels. To address this issue, we visually inspect 14,000 images and correct their scene labels, yielding a subset of cleanly-annotated images, named MultiScene-Clean. With it, we can develop and evaluate deep networks for multi-scene recognition using clean data. Moreover, we provide crowdsourced annotations of all images for the purpose of studying network learning with noisy labels. We conduct experiments with extensive baseline models on both MultiScene-Clean and MultiScene to offer benchmarks for multi-scene recognition in single images and learning from noisy labels for this task, respectively. To facilitate progress, we make our dataset and trained models available on https://gitlab.lrz.de/ai4eo/reasoning/multiscene.

updated: Tue Sep 07 2021 13:02:45 GMT+0000 (UTC)

published: Wed Apr 07 2021 01:09:12 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト