Global and Local Contrastive Self-Supervised Learning for Semantic Segmentation of HR Remote Sensing Images

Haifeng Li; Yi Li; Guo Zhang; Ruoyun Liu; Haozhe Huang; Qing Zhu; Chao Tao

HRリモートセンシング画像のセマンティックセグメンテーションのためのグローバルおよびローカルの対照的な自己監視学習

セマンティックセグメンテーションの教師あり学習には、多数のラベル付きサンプルが必要ですが、リモートセンシングの分野では取得が困難です。自己監視学習（SSL）は、多数のラベルなし画像を使用して一般モデルを事前トレーニングし、ラベル付きサンプルが非常に少ないダウンストリームタスクで微調整することにより、このような問題を解決するために使用できます。対照学習は、一般的な不変の機能を学習できるSSLの典型的な方法です。ただし、ほとんどの既存の対照学習方法は、画像レベルの表現を取得するための分類タスク用に設計されています。これは、ピクセルレベルの識別を必要とするセマンティックセグメンテーションタスクには最適ではない場合があります。したがって、リモートセンシング画像セグメンテーションのためのグローバルスタイルとローカルマッチング対照学習ネットワーク（GLCNet）を提案します。具体的には、1）スタイルの特徴が全体的な画像の特徴をより適切に表現できると考えるため、グローバルスタイルの対照学習モジュールを使用して画像レベルの表現をより適切に学習します。 2）対照学習モジュールに一致する局所特徴は、局所領域の表現を学習するように設計されており、これはセマンティックセグメンテーションに有益です。実験結果は、私たちの方法が、SOTAの自己監視方法およびImageNet事前トレーニング方法よりもほとんど優れていることを示しています。具体的には、元のデータセットからの1％の注釈を使用して、既存のベースラインと比較して、ISPRSポツダムデータセットでカッパを6％改善します。さらに、アップストリームタスクとダウンストリームタスクのデータセットにいくつかの違いがある場合、私たちの方法は教師あり学習方法よりも優れています。 SSLは、リモートセンシング分野で簡単に取得できるラベルのないデータからデータの本質的な特性を直接学習できるため、グローバルマッピングなどのタスクにとって非常に重要な場合があります。ソースコードはhttps://github.com/GeoX-Lab/G-RSIMで入手できます。

Supervised learning for semantic segmentation requires a large number of labeled samples, which is difficult to obtain in the field of remote sensing. Self-supervised learning (SSL), can be used to solve such problems by pre-training a general model with a large number of unlabeled images and then fine-tuning it on a downstream task with very few labeled samples. Contrastive learning is a typical method of SSL that can learn general invariant features. However, most existing contrastive learning methods are designed for classification tasks to obtain an image-level representation, which may be suboptimal for semantic segmentation tasks requiring pixel-level discrimination. Therefore, we propose a global style and local matching contrastive learning network (GLCNet) for remote sensing image semantic segmentation. Specifically, 1) the global style contrastive learning module is used to better learn an image-level representation, as we consider that style features can better represent the overall image features. 2) The local features matching contrastive learning module is designed to learn representations of local regions, which is beneficial for semantic segmentation. The experimental results show that our method mostly outperforms SOTA self-supervised methods and the ImageNet pre-training method. Specifically, with 1% annotation from the original dataset, our approach improves Kappa by 6% on the ISPRS Potsdam dataset relative to the existing baseline. Moreover, our method outperforms supervised learning methods when there are some differences between the datasets of upstream tasks and downstream tasks. Since SSL could directly learn the essential characteristics of data from unlabeled data, which is easy to obtain in the remote sensing field, this may be of great significance for tasks such as global mapping. The source code is available at https://github.com/GeoX-Lab/G-RSIM.

updated: Sat Jan 29 2022 03:43:02 GMT+0000 (UTC)

published: Sun Jun 20 2021 03:03:40 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト