Expediting Building Footprint Segmentation from High-resolution Remote Sensing Images via progressive lenient supervision

Haonan Guo; Bo Du; Chen Wu; Xin Su; Liangpei Zhang

進歩的な寛大な監視により、高解像度のリモートセンシング画像から建物のフットプリントのセグメンテーションを促進

リモートで検知された画像からフットプリントセグメンテーションを構築する有効性は、モデル転送の有効性によって妨げられてきました。既存の建物セグメンテーション手法の多くは、U-Net のエンコーダ/デコーダアーキテクチャに基づいて開発されており、ImageNet で事前トレーニングされた新しく開発されたバックボーンネットワークからエンコーダが微調整されます。しかし、既存のデコーダ設計の計算負荷が大きいため、これらの最新のエンコーダネットワークをリモートセンシングタスクにうまく移行することができません。広く採用されているディープ監視戦略でも、前景ピクセルと背景ピクセルが混在するハイブリッド領域では無効な損失が発生するため、これらの課題を軽減できません。この論文では、フットプリントセグメンテーションを構築するための既存のデコーダネットワーク設計の包括的な評価を実施し、学習の効率と有効性を高めるための BFSeg と呼ばれる効率的なフレームワークを提案します。具体的には、スケール間での簡単かつ高速な特徴融合を容易にする、密に接続された粗密特徴融合デコーダネットワークが提案されています。さらに、深い監視プロセス中のダウンサンプリングされたグラウンドトゥルースにおけるハイブリッド領域の無効性を考慮して、ネットワークが深い監視から適切な知識を学習できるようにする寛大な深い監視と蒸留戦略を提案します。これらの進歩に基づいて、私たちはビルディングセグメンテーションネットワークの新しいファミリーを開発しました。これは、新しく開発された広範囲のエンコーダネットワークにわたって優れたパフォーマンスと効率で以前の研究を常に上回っています。コードは https://github.com/HaonanGuo/BFSeg-Efficient-Building-Footprint-Segmentation-Framework でリリースされます。

The efficacy of building footprint segmentation from remotely sensed images has been hindered by model transfer effectiveness. Many existing building segmentation methods were developed upon the encoder-decoder architecture of U-Net, in which the encoder is finetuned from the newly developed backbone networks that are pre-trained on ImageNet. However, the heavy computational burden of the existing decoder designs hampers the successful transfer of these modern encoder networks to remote sensing tasks. Even the widely-adopted deep supervision strategy fails to mitigate these challenges due to its invalid loss in hybrid regions where foreground and background pixels are intermixed. In this paper, we conduct a comprehensive evaluation of existing decoder network designs for building footprint segmentation and propose an efficient framework denoted as BFSeg to enhance learning efficiency and effectiveness. Specifically, a densely-connected coarse-to-fine feature fusion decoder network that facilitates easy and fast feature fusion across scales is proposed. Moreover, considering the invalidity of hybrid regions in the down-sampled ground truth during the deep supervision process, we present a lenient deep supervision and distillation strategy that enables the network to learn proper knowledge from deep supervision. Building upon these advancements, we have developed a new family of building segmentation networks, which consistently surpass prior works with outstanding performance and efficiency across a wide range of newly developed encoder networks. The code will be released on https://github.com/HaonanGuo/BFSeg-Efficient-Building-Footprint-Segmentation-Framework.

updated: Sun Jul 23 2023 03:55:13 GMT+0000 (UTC)

published: Sun Jul 23 2023 03:55:13 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト