Learning Cross-Scale Visual Representations for Real-Time Image Geo-Localization

Tianyi Zhang; Matthew Johnson-Roberson

リアルタイム画像ジオローカリゼーションのためのクロススケール視覚表現の学習

ロボットのローカリゼーションは、GPSが拒否された環境では依然として困難な作業です。カメラやIMUなどのローカルセンサーに基づく状態推定アプローチは、エラーが蓄積するにつれて、長距離ミッションでドリフトしやすくなります。この研究では、2Dマルチモーダル地理空間マップで画像観測をローカライズすることにより、この問題に対処することを目指しています。クロススケールデータセットと、クロスモダリティソースから追加データを生成する方法を紹介します。監督なしでクロススケールの視覚的表現を学習するフレームワークを提案します。実験は、水中と空中の2つの異なるドメインからのデータで行われます。クロスビュー画像のジオローカリゼーションに関する既存の研究とは対照的に、私たちのアプローチはa）小規模なマルチモーダルマップでより優れたパフォーマンスを発揮します。 b）リアルタイムアプリケーションの方が計算効率が高い。 c）状態推定パイプラインと直接連携してサービスを提供できます。

Robot localization remains a challenging task in GPS denied environments. State estimation approaches based on local sensors, e.g. cameras or IMUs, are drifting-prone for long-range missions as error accumulates. In this study, we aim to address this problem by localizing image observations in a 2D multi-modal geospatial map. We introduce the cross-scale dataset and a methodology to produce additional data from cross-modality sources. We propose a framework that learns cross-scale visual representations without supervision. Experiments are conducted on data from two different domains, underwater and aerial. In contrast to existing studies in cross-view image geo-localization, our approach a) performs better on smaller-scale multi-modal maps; b) is more computationally efficient for real-time applications; c) can serve directly in concert with state estimation pipelines.

updated: Thu Sep 09 2021 08:08:54 GMT+0000 (UTC)

published: Thu Sep 09 2021 08:08:54 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト