Swin Transformer coupling CNNs Makes Strong Contextual Encoders for VHR Image Road Extraction

Tao Chen; Yiran Liu; Haoyu Jiang; Ruirui Li

CNN を結合した Swin Transformer により、VHR 画像道路抽出用の強力なコンテキストエンコーダが実現

道路を正確にセグメント化することは、クラス内の大幅な変動、不明瞭なクラス間の区別、影、木、建物によって引き起こされるオクルージョンのため、困難です。これらの課題に対処するには、重要なテクスチャの詳細に注意を払い、グローバルな幾何学的コンテキスト情報を認識することが不可欠です。最近の研究では、CNN と Transformer のハイブリッド構造が、CNN または Transformer を単独で使用した場合よりも優れたパフォーマンスを発揮することが示されています。 CNN はローカルの詳細な特徴を抽出することに優れていますが、Transformer はグローバルなコンテキスト情報を自然に認識します。この論文では、道路抽出タスクのために ResNet と SwinTransformers を組み合わせた ConSwin という名前のデュアルブランチネットワークブロックを提案します。この ConSwin ブロックは、両方のアプローチの長所を活用して、詳細な特徴とグローバルな特徴をより適切に抽出します。 ConSwin に基づいて、砂時計型の道路抽出ネットワークを構築し、テクスチャと構造の詳細情報をデコーダに適切に送信するために 2 つの新しい接続構造を導入します。私たちが提案した手法は、全体的な精度、IOU、および F1 指標の点で、マサチューセッツ州データセットと CHN6-CUG データセットの両方で最先端の手法を上回っています。追加の実験により、提案したモジュールの有効性が検証され、視覚化の結果により、より適切な道路表現を取得する能力が実証されました。

Accurately segmenting roads is challenging due to substantial intra-class variations, indistinct inter-class distinctions, and occlusions caused by shadows, trees, and buildings. To address these challenges, attention to important texture details and perception of global geometric contextual information are essential. Recent research has shown that CNN-Transformer hybrid structures outperform using CNN or Transformer alone. While CNN excels at extracting local detail features, the Transformer naturally perceives global contextual information. In this paper, we propose a dual-branch network block named ConSwin that combines ResNet and SwinTransformers for road extraction tasks. This ConSwin block harnesses the strengths of both approaches to better extract detailed and global features. Based on ConSwin, we construct an hourglass-shaped road extraction network and introduce two novel connection structures to better transmit texture and structural detail information to the decoder. Our proposed method outperforms state-of-the-art methods on both the Massachusetts and CHN6-CUG datasets in terms of overall accuracy, IOU, and F1 indicators. Additional experiments validate the effectiveness of our proposed module, while visualization results demonstrate its ability to obtain better road representations.

updated: Sun May 28 2023 06:57:17 GMT+0000 (UTC)

published: Mon Jan 10 2022 06:05:12 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト