Generalized Contrastive Optimization of Siamese Networks for Place Recognition

María Leyva-Vallina; Nicola Strisciuglio; Nicolai Petkov

場所認識のためのシャムネットワークの一般化された対照最適化

視覚的な場所の認識は、コンピュータビジョンにおける挑戦的なタスクであり、カメラベースのローカリゼーションおよびナビゲーションシステムの重要なコンポーネントです。最近、畳み込みニューラルネットワーク（CNN）は、高い結果と優れた一般化機能を実現しました。それらは通常、バイナリ形式で、類似または非類似のいずれかとしてラベル付けされた画像のペアまたはトリプレットを使用してトレーニングされます。実際には、2つの画像間の類似性はバイナリではなく、連続的です。さらに、これらのCNNのトレーニングは計算が複雑で、コストのかかるペアおよびトリプレットマイニング戦略が含まれます。連続尺度として画像の類似性に依存する一般化畳み込み損失（GCL）関数を提案し、それを使用してシャムCNNをトレーニングします。さらに、類似度を示すラベルを使用して画像ペアに自動注釈を付けるための3つの手法を紹介し、それらを展開してMSLS、TB-Places、および7Scenesデータセットに再注釈を付けます。 GCL関数と改善された注釈を使用してトレーニングされたシャムCNNが、対応するバイナリよりも一貫して優れていることを示します。 MSLSでトレーニングされたモデルは、NetVLAD、NetVLAD-SARE、AP-GeM、Patch-NetVLADなどの最先端の方法よりも優れており、Pittsburgh30k、Tokyo 24/7、RobotCar Seasons v2、Extended CMUSeasonsで一般化されています。データセット。さらに、GCL関数を使用してシャムネットワークをトレーニングする場合、複雑なペアマイニングは必要ありません。ソースコードはhttps://github.com/marialeyvallina/generalized_contrastive_lossでリリースされています。

Visual place recognition is a challenging task in computer vision and a key component of camera-based localization and navigation systems. Recently, Convolutional Neural Networks (CNNs) achieved high results and good generalization capabilities. They are usually trained using pairs or triplets of images labeled as either similar or dissimilar, in a binary fashion. In practice, the similarity between two images is not binary, but continuous. Furthermore, training these CNNs is computationally complex and involves costly pair and triplet mining strategies. We propose a Generalized Contrastive loss (GCL) function that relies on image similarity as a continuous measure, and use it to train a siamese CNN. Furthermore, we present three techniques for automatic annotation of image pairs with labels indicating their degree of similarity, and deploy them to re-annotate the MSLS, TB-Places, and 7Scenes datasets. We demonstrate that siamese CNNs trained using the GCL function and the improved annotations consistently outperform their binary counterparts. Our models trained on MSLS outperform the state-of-the-art methods, including NetVLAD, NetVLAD-SARE, AP-GeM and Patch-NetVLAD, and generalize well on the Pittsburgh30k, Tokyo 24/7, RobotCar Seasons v2 and Extended CMU Seasons datasets. Furthermore, training a siamese network using the GCL function does not require complex pair mining. We release the source code at https://github.com/marialeyvallina/generalized_contrastive_loss.

updated: Mon May 09 2022 12:37:22 GMT+0000 (UTC)

published: Thu Mar 11 2021 12:32:05 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト