Improving Semantic Segmentation of Aerial Images Using Patch-based Attention

Lei Ding; Hao Tang; Lorenzo Bruzzone

パッチベースの注意を使用した空中画像のセマンティックセグメンテーションの改善

航空画像の密な分類/セマンティックセグメンテーションでは、特徴表現力と空間的位置精度のトレードオフが重要です。ニューラルネットワークの後半層から抽出された高レベルの特徴には、意味情報が豊富ですが、空間の詳細がぼやけています。ネットワークの初期層から抽出された低レベルの特徴には、より多くのピクセルレベルの情報が含まれていますが、孤立しており、ノイズが多くなっています。したがって、物理情報の内容と空間分布の点で違いがあるため、高レベル機能と低レベル機能のギャップを埋めることは困難です。この作業では、2つの方法で特徴表現を強化することにより、この問題の解決に貢献します。一方では、局所的注意のパッチごとの計算に基づいてコンテキスト情報の埋め込みを強化するために、パッチ注意モジュール（PAM）が提案されています。一方、アテンション埋め込みモジュール（AEM）は、高レベルの機能からローカルフォーカスを埋め込むことにより、低レベルの機能のセマンティック情報を充実させるために提案されています。提案されたモジュールはどちらも軽量であり、畳み込みニューラルネットワーク（CNN）の抽出された機能の処理に適用できます。実験は、提案されたモジュールをベースラインの完全畳み込みネットワーク（FCN）に統合することにより、結果のローカルアテンションネットワーク（LANet）がベースラインを大幅に改善し、2つの航空画像データセットで他のアテンションベースの方法よりも優れていることを示しています。

The trade-off between feature representation power and spatial localization accuracy is crucial for the dense classification/semantic segmentation of aerial images. High-level features extracted from the late layers of a neural network are rich in semantic information, yet have blurred spatial details; low-level features extracted from the early layers of a network contain more pixel-level information, but are isolated and noisy. It is therefore difficult to bridge the gap between high and low-level features due to their difference in terms of physical information content and spatial distribution. In this work, we contribute to solve this problem by enhancing the feature representation in two ways. On the one hand, a patch attention module (PAM) is proposed to enhance the embedding of context information based on a patch-wise calculation of local attention. On the other hand, an attention embedding module (AEM) is proposed to enrich the semantic information of low-level features by embedding local focus from high-level features. Both of the proposed modules are light-weight and can be applied to process the extracted features of convolutional neural networks (CNNs). Experiments show that, by integrating the proposed modules into the baseline Fully Convolutional Network (FCN), the resulting local attention network (LANet) greatly improves the performance over the baseline and outperforms other attention based methods on two aerial image datasets.

updated: Wed Nov 20 2019 13:12:04 GMT+0000 (UTC)

published: Wed Nov 20 2019 13:12:04 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト