Beyond Cross-view Image Retrieval: Highly Accurate Vehicle Localization Using Satellite Image

Yujiao Shi; Hongdong Li

クロスビュー画像検索を超えて：衛星画像を使用した高精度の車両位置特定

この論文は、地上レベルの画像を俯瞰衛星地図と照合することにより、車載カメラの位置特定の問題に対処します。既存の方法では、この問題をクロスビュー画像検索として扱い、学習した詳細な機能を使用して、地上レベルのクエリ画像を衛星地図のパーティション（小さなパッチなど）に一致させることがよくあります。これらの方法では、位置特定の精度は衛星地図の分割密度によって制限されます（多くの場合、数十メートルのオーダー）。画像検索の従来の知識から離れて、この論文は、高精度のローカリゼーションを達成することができる新しい解決策を提示します。重要なアイデアは、ポーズ推定としてタスクを定式化し、ニューラルネットベースの最適化によってそれを解決することです。具体的には、地上画像と衛星画像からそれぞれ堅牢な特徴を抽出する2分岐CNNを設計します。広大なクロスビュードメインギャップを埋めるために、相対的なカメラポーズに基づいて、衛星地図から地上ビューにフィーチャを投影するジオメトリプロジェクションモジュールを使用します。投影された特徴と観察された特徴の違いを最小限に抑えることを目的として、微分可能なLevenberg-Marquardt（LM）モジュールを使用して、最適なカメラポーズを繰り返し検索します。パイプライン全体は差別化可能であり、エンドツーエンドで実行されます。標準的な自動運転車の位置特定データセットに関する広範な実験により、提案された方法の優位性が確認されました。特に、たとえば、40m x 40mの広い領域内のカメラ位置の大まかな推定から始めて、80％の可能性で、私たちの方法は、新しいKITTIクロスビューデータセットで横方向の位置誤差を5m以内にすばやく削減します。

This paper addresses the problem of vehicle-mounted camera localization by matching a ground-level image with an overhead-view satellite map. Existing methods often treat this problem as cross-view image retrieval, and use learned deep features to match the ground-level query image to a partition (eg, a small patch) of the satellite map. By these methods, the localization accuracy is limited by the partitioning density of the satellite map (often in the order of tens meters). Departing from the conventional wisdom of image retrieval, this paper presents a novel solution that can achieve highly-accurate localization. The key idea is to formulate the task as pose estimation and solve it by neural-net based optimization. Specifically, we design a two-branch CNN to extract robust features from the ground and satellite images, respectively. To bridge the vast cross-view domain gap, we resort to a Geometry Projection module that projects features from the satellite map to the ground-view, based on a relative camera pose. Aiming to minimize the differences between the projected features and the observed features, we employ a differentiable Levenberg-Marquardt (LM) module to search for the optimal camera pose iteratively. The entire pipeline is differentiable and runs end-to-end. Extensive experiments on standard autonomous vehicle localization datasets have confirmed the superiority of the proposed method. Notably, e.g., starting from a coarse estimate of camera location within a wide region of 40m x 40m, with an 80% likelihood our method quickly reduces the lateral location error to be within 5m on a new KITTI cross-view dataset.

updated: Sun Sep 04 2022 12:37:24 GMT+0000 (UTC)

published: Sun Apr 10 2022 19:16:58 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト