Semantic Labeling of Large-Area Geographic Regions Using Multi-View and Multi-Date Satellite Images and Noisy OSM Training Labels

Bharath Comandur; Avinash C. Kak

マルチビューおよびマルチ日付衛星画像とノイズの多いOSMトレーニングラベルを使用した大面積の地理的領域のセマンティックラベリング

複数の重複する衛星画像からの情報とOpenStreetMap（OSM）から派生したノイズの多いトレーニングラベルを組み合わせて、広い地理的領域（100 km ^ 2）の建物や道路に意味的にラベルを付けるための、新しいマルチビュートレーニングフレームワークとCNNアーキテクチャを紹介します。マルチビューセマンティックセグメンテーションへのアプローチでは、ビューを互いに独立して使用する従来のアプローチと比較して、クラスごとのIoUスコアが4〜7％向上します。私たちのシステムのユニークな（そしておそらく驚くべき）特性は、マルチビューデータから学習するためにCNNのテールエンドに追加された変更を、推論時に破棄できることです。総合業績。これは、複数のビューを使用したトレーニングの利点がネットワークのすべてのレイヤーによって吸収されることを意味します。さらに、私たちのアプローチでは、シーンごとに最大32のビューを使用してトレーニングする場合でも、GPUメモリの消費に関してわずかなオーバーヘッドしか追加されません。私たちが提示するシステムはエンドツーエンドで自動化されており、真のオルソフォトで直接トレーニングされた分類器と、最初にオフナディア画像でトレーニングされた分類器を比較し、その後、予測されたラベルを地理座標に変換します。人間の監督がない場合、建物と道路のクラスのIoUスコアはそれぞれ0.8と0.64であり、OSMラベルを使用し、完全に自動化されていない最先端のアプローチよりも優れています。

We present a novel multi-view training framework and CNN architecture for combining information from multiple overlapping satellite images and noisy training labels derived from OpenStreetMap (OSM) to semantically label buildings and roads across large geographic regions (100 km^2). Our approach to multi-view semantic segmentation yields a 4-7% improvement in the per-class IoU scores compared to the traditional approaches that use the views independently of one another. A unique (and, perhaps, surprising) property of our system is that modifications that are added to the tail-end of the CNN for learning from the multi-view data can be discarded at the time of inference with a relatively small penalty in the overall performance. This implies that the benefits of training using multiple views are absorbed by all the layers of the network. Additionally, our approach only adds a small overhead in terms of the GPU-memory consumption even when training with as many as 32 views per scene. The system we present is end-to-end automated, which facilitates comparing the classifiers trained directly on true orthophotos vis-a-vis first training them on the off-nadir images and subsequently translating the predicted labels to geographical coordinates. With no human supervision, our IoU scores for the buildings and roads classes are 0.8 and 0.64 respectively which are better than state-of-the-art approaches that use OSM labels and that are not completely automated.

updated: Sun Jun 27 2021 02:50:21 GMT+0000 (UTC)

published: Mon Aug 24 2020 09:03:31 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト