Aerial Scene Parsing: From Tile-level Scene Classification to Pixel-wise Semantic Labeling

Yang Long; Gui-Song Xia; Liangpei Zhang; Gong Cheng; Deren Li

空中シーン解析：タイルレベルのシーン分類からピクセル単位のセマンティックラベリングまで

空中画像が与えられると、空中シーン解析（ASP）は、画像のすべてのピクセルにセマンティックラベルを割り当てるなどして、画像コンテンツのセマンティック構造を解釈することを目的としています。データ駆動型手法の普及に伴い、過去数十年は、高解像度の航空画像を使用する場合に、タイルレベルのシーン分類またはセグメンテーションベースの画像分析のスキームの問題に取り組むことにより、ASPの有望な進歩を目の当たりにしてきました。ただし、前者のスキームはタイル単位の境界で結果を生成することが多く、後者のスキームではピクセルからセマンティクスまでの複雑なモデリングプロセスを処理する必要があります。これには、ピクセル単位のセマンティックラベルを使用した大規模で注釈の付いた画像サンプルが必要になることがよくあります。このホワイトペーパーでは、タイルレベルのシーン分類からピクセル単位のセマンティックラベリングまでの観点から、ASPでこれらの問題に対処します。具体的には、最初に文献レビューによる航空画像の解釈を再検討します。次に、Million-AIDと呼ばれる100万枚の航空写真を含む大規模なシーン分類データセットを示します。提示されたデータセットを使用して、古典的な畳み込みニューラルネットワーク（CNN）を使用したベンチマーク実験も報告します。最後に、タイルレベルのシーン分類とオブジェクトベースの画像分析を統合してASPを実行し、ピクセル単位のセマンティックラベリングを実現します。集中的な実験により、Million-AIDは挑戦的でありながら有用なデータセットであり、新しく開発されたアルゴリズムを評価するためのベンチマークとして役立つことが示されています。 Million-AIDから知識を転送する場合、Million-AIDで事前トレーニングされたCNNモデルの微調整は、空中シーン分類のために事前トレーニングされたImageNetよりも一貫して優れたパフォーマンスを発揮します。さらに、私たちが設計した階層型マルチタスク学習方法は、挑戦的なGIDで最先端のピクセル単位の分類を実現し、空中画像解釈のためのピクセル単位のセマンティックラベリングに向けてタイルレベルのシーン分類を橋渡しします。

Given an aerial image, aerial scene parsing (ASP) targets to interpret the semantic structure of the image content, e.g., by assigning a semantic label to every pixel of the image. With the popularization of data-driven methods, the past decades have witnessed promising progress on ASP by approaching the problem with the schemes of tile-level scene classification or segmentation-based image analysis, when using high-resolution aerial images. However, the former scheme often produces results with tile-wise boundaries, while the latter one needs to handle the complex modeling process from pixels to semantics, which often requires large-scale and well-annotated image samples with pixel-wise semantic labels. In this paper, we address these issues in ASP, with perspectives from tile-level scene classification to pixel-wise semantic labeling. Specifically, we first revisit aerial image interpretation by a literature review. We then present a large-scale scene classification dataset that contains one million aerial images termed Million-AID. With the presented dataset, we also report benchmarking experiments using classical convolutional neural networks (CNNs). Finally, we perform ASP by unifying the tile-level scene classification and object-based image analysis to achieve pixel-wise semantic labeling. Intensive experiments show that Million-AID is a challenging yet useful dataset, which can serve as a benchmark for evaluating newly developed algorithms. When transferring knowledge from Million-AID, fine-tuning CNN models pretrained on Million-AID perform consistently better than those pretrained ImageNet for aerial scene classification. Moreover, our designed hierarchical multi-task learning method achieves the state-of-the-art pixel-wise classification on the challenging GID, bridging the tile-level scene classification toward pixel-wise semantic labeling for aerial image interpretation.

updated: Sun Jan 09 2022 05:23:03 GMT+0000 (UTC)

published: Thu Jan 06 2022 07:40:47 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト