Semantic Labeling of High Resolution Images Using EfficientUNets and Transformers

Hasan AlMarzouqi; Lyes Saad Saoud

EfficientUNetsとTransformersを使用した高解像度画像のセマンティックラベリング

セマンティックセグメンテーションには、膨大な量のデータを処理しながら高レベルの特性を学習するアプローチが必要です。畳み込みニューラルネットワーク（CNN）は、この目的を達成するための独自の適応機能を学習できます。ただし、リモートセンシング画像のサイズが大きく、空間分解能が高いため、これらのネットワークではシーン全体を効率的に分析できません。最近、ディープトランスフォーマーは、画像内のさまざまなオブジェクト間のグローバルな相互作用を記録する機能を証明しました。この論文では、畳み込みニューラルネットワークとトランスフォーマーを組み合わせた新しいセグメンテーションモデルを提案し、ローカルとグローバルの特徴抽出技術のこの混合がリモートセンシングセグメンテーションに大きな利点を提供することを示します。さらに、提案されたモデルには、ネットワークのマルチモーダル入力と出力を効率的に表すように設計された2つの融合層が含まれています。入力融合レイヤーは、画像コンテンツと標高マップ（DSM）の関係を要約したフィーチャマップを抽出します。出力融合レイヤーは、クラス固有の特徴抽出レイヤーと損失関数を使用してクラスラベルが識別される、新しいマルチタスクセグメンテーション戦略を使用します。最後に、高速マーチングメソッドを使用して、識別されていないすべてのクラスラベルを最も近い既知の隣接ラベルに変換します。私たちの結果は、提案された方法論が最先端の技術と比較してセグメンテーションの精度を向上させることを示しています。

Semantic segmentation necessitates approaches that learn high-level characteristics while dealing with enormous amounts of data. Convolutional neural networks (CNNs) can learn unique and adaptive features to achieve this aim. However, due to the large size and high spatial resolution of remote sensing images, these networks cannot analyze an entire scene efficiently. Recently, deep transformers have proven their capability to record global interactions between different objects in the image. In this paper, we propose a new segmentation model that combines convolutional neural networks with transformers, and show that this mixture of local and global feature extraction techniques provides significant advantages in remote sensing segmentation. In addition, the proposed model includes two fusion layers that are designed to represent multi-modal inputs and output of the network efficiently. The input fusion layer extracts feature maps summarizing the relationship between image content and elevation maps (DSM). The output fusion layer uses a novel multi-task segmentation strategy where class labels are identified using class-specific feature extraction layers and loss functions. Finally, a fast-marching method is used to convert all unidentified class labels to their closest known neighbors. Our results demonstrate that the proposed methodology improves segmentation accuracy compared to state-of-the-art techniques.

updated: Mon Jun 20 2022 12:03:54 GMT+0000 (UTC)

published: Mon Jun 20 2022 12:03:54 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト