A Novel Self-Supervised Cross-Modal Image Retrieval Method In Remote Sensing

Gencer Sumbul; Markus Müller; Begüm Demir

リモートセンシングにおける新しい自己監視クロスモーダル画像検索法

マルチモーダルリモートセンシング（RS）画像アーカイブが利用できるため、最も重要な研究トピックの1つは、異なるモダリティ間で意味的に類似した画像を検索するクロスモーダルRS画像検索（CM-RSIR）メソッドの開発です。既存のCM-RSIRメソッドでは、高品質で大量の注釈付きトレーニング画像を利用できる必要があります。十分な数の信頼できるラベル付き画像の収集は、運用シナリオでは時間と複雑さとコストがかかり、CM-RSIRの最終的な精度に大きな影響を与える可能性があります。この論文では、以下を目的とした新しい自己監視CM-RSIR法を紹介します。i）自己監視方式で異なるモダリティ間の相互情報量をモデル化する。 ii）互いに類似したモーダル固有の特徴空間の分布を保持します。 iii）注釈付きのトレーニング画像を必要とせずに、各モダリティ内で最も類似した画像を定義します。この目的のために、3つの損失関数を同時に含む新しい目的を提案します。i）モーダル間の類似性を維持するために異なるモダリティの相互情報量を最大化します。 ii）モーダル間の不一致を排除するために、マルチモーダル画像タプルの角距離を最小化します。 iii）モーダル内の類似性の特性評価のために、各モダリティ内の最も類似した画像のコサイン類似性を高めます。実験結果は、最先端の方法と比較した提案された方法の有効性を示しています。提案された方法のコードは、https：//git.tu-berlin.de/rsim/SS-CM-RSIRで公開されています。

Due to the availability of multi-modal remote sensing (RS) image archives, one of the most important research topics is the development of cross-modal RS image retrieval (CM-RSIR) methods that search semantically similar images across different modalities. Existing CM-RSIR methods require the availability of a high quality and quantity of annotated training images. The collection of a sufficient number of reliable labeled images is time consuming, complex and costly in operational scenarios, and can significantly affect the final accuracy of CM-RSIR. In this paper, we introduce a novel self-supervised CM-RSIR method that aims to: i) model mutual-information between different modalities in a self-supervised manner; ii) retain the distributions of modal-specific feature spaces similar to each other; and iii) define the most similar images within each modality without requiring any annotated training image. To this end, we propose a novel objective including three loss functions that simultaneously: i) maximize mutual information of different modalities for inter-modal similarity preservation; ii) minimize the angular distance of multi-modal image tuples for the elimination of inter-modal discrepancies; and iii) increase cosine similarity of the most similar images within each modality for the characterization of intra-modal similarities. Experimental results show the effectiveness of the proposed method compared to state-of-the-art methods. The code of the proposed method is publicly available at https://git.tu-berlin.de/rsim/SS-CM-RSIR.

updated: Tue Jul 12 2022 09:15:47 GMT+0000 (UTC)

published: Wed Feb 23 2022 11:20:24 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト