LEDCNet: A Lightweight and Efficient Semantic Segmentation Algorithm Using Dual Context Module for Extracting Ground Objects from UAV Aerial Remote Sensing Images

Xiaoxiang Han; Yiman Liu; Gang Liu; Qiaohong Liu

LEDCNet: UAV 空中リモートセンシング画像から地上オブジェクトを抽出するためのデュアルコンテキストモジュールを使用した軽量で効率的なセマンティックセグメンテーションアルゴリズム

測量やマッピングの分野では、ディープラーニングによって UAV リモートセンシング画像から道路や家屋などの地上オブジェクトを抽出するセマンティックセグメンテーションが、従来の手動セグメンテーションよりも効率的で便利な方法になります。近年、レイヤーの深化と複雑性の増大に伴い、畳み込みベースのセマンティックセグメンテーションニューラルネットワークのパラメーター数が大幅に増加しています。モデルを軽量化し、モデルの精度を向上させるために、LEDCNet という名前の UAV リモートセンシング画像から地上オブジェクトを抽出するための新しい軽量で効率的なネットワークが提案されています。提案されたネットワークは、LDCNet と呼ばれる強力で軽量なバックボーンネットワークがエンコーダとして開発されたエンコーダ/デコーダアーキテクチャを採用しています。 LDCNet を軽量セマンティックセグメンテーションアルゴリズムの新世代バックボーンネットワークに拡張します。デコーダ部分では、ASPP モジュールと OCR モジュールで構成されるデュアルマルチスケールコンテキストモジュールが、UAV リモートセンシング画像の特徴マップからより多くのコンテキスト情報を取得するように設計されています。 ASPP と OCR の間では、FPN モジュールが使用され、ASPP から抽出されたマルチスケールフィーチャが融合されます。 2431 のトレーニングセット、945 の検証セット、および 475 のテストセットを含む、UAV によって取得されたリモートセンシング画像のプライベートデータセットが構築されます。提案されたモデルは、わずか 140 万のパラメーターと 5.48G の FLOP で、このデータセットでうまく機能し、71.12% の mIoU を達成します。公開された LoveDA データセットと CITY-OSM データセットでのより広範な実験により、提案されたモデルの有効性がさらに検証され、mIoU でそれぞれ 65.27% と 74.39% という優れた結果が得られました。すべての実験結果は、提案されたモデルがいくつかのパラメーターでネットワークを軽量化できるだけでなく、セグメンテーションのパフォーマンスも改善できることを示しています。

Semantic segmentation for extracting ground objects, such as road and house, from UAV remote sensing images by deep learning becomes a more efficient and convenient method than traditional manual segmentation in surveying and mapping field. In recent years, with the deepening of layers and boosting of complexity, the number of parameters in convolution-based semantic segmentation neural networks considerably increases, which is obviously not conducive to the wide application especially in the industry. In order to make the model lightweight and improve the model accuracy, a new lightweight and efficient network for the extraction of ground objects from UAV remote sensing images, named LEDCNet, is proposed. The proposed network adopts an encoder-decoder architecture in which a powerful lightweight backbone network called LDCNet is developed as the encoder. We would extend the LDCNet become a new generation backbone network of lightweight semantic segmentation algorithms. In the decoder part, the dual multi-scale context modules which consist of the ASPP module and the OCR module are designed to capture more context information from feature maps of UAV remote sensing images. Between ASPP and OCR, a FPN module is used to and fuse multi-scale features extracting from ASPP. A private dataset of remote sensing images taken by UAV which contains 2431 training sets, 945 validation sets, and 475 test sets is constructed. The proposed model performs well on this dataset, with only 1.4M parameters and 5.48G FLOPs, achieving an mIoU of 71.12%. The more extensive experiments on the public LoveDA dataset and CITY-OSM dataset to further verify the effectiveness of the proposed model with excellent results on mIoU of 65.27% and 74.39%, respectively. All the experimental results show the proposed model can not only lighten the network with few parameters but also improve the segmentation performance.

updated: Tue Dec 27 2022 15:55:28 GMT+0000 (UTC)

published: Fri Dec 16 2022 14:02:12 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト