VIGOR: Cross-View Image Geo-localization beyond One-to-one Retrieval

Sijie Zhu; Taojiannan Yang; Chen Chen

VIGOR：1対1の検索を超えたクロスビュー画像のジオローカリゼーション

クロスビュー画像のジオローカリゼーションは、空中からのGPSタグ付き参照画像と照合することにより、ストリートビュークエリ画像の場所を特定することを目的としています。最近の研究は、都市規模のデータセットで驚くほど高い検索精度を達成しています。ただし、これらの結果は、クエリ画像の場所を正確に中心とした参照画像が存在するという仮定に依存しています。これは実際のシナリオには適用できません。このホワイトペーパーでは、クエリ画像が対象領域内で任意であり、クエリが出現する前に参照画像がキャプチャされるという、より現実的な仮定を使用して、この問題を再定義します。この仮定は、クエリと参照画像が完全に整列されたペアではなく、1つのクエリ位置をカバーする複数の参照画像が存在する可能性があるため、既存のデータセットの1対1の取得設定を破ります。この現実的な設定と既存のデータセットの間のギャップを埋めるために、1対1の取得を超えたクロスビュー画像のジオローカリゼーションのための新しい大規模ベンチマークであるVIGORを提案します。既存の最先端の方法をベンチマークし、クエリを大まかに細かくローカライズするための新しいエンドツーエンドのフレームワークを提案します。画像レベルの検索精度とは別に、生のGPSデータを使用して実際の距離（メートル）の観点からローカリゼーションの精度も評価します。提案された方法の有効性を検証するために、さまざまなアプリケーションシナリオで広範な実験が行われます。結果は、この現実的な設定でのクロスビュージオローカリゼーションは依然として困難であり、この方向での新しい研究を促進していることを示しています。データセットとコードはhttps://github.com/Jeff-Zilence/VIGORでリリースされます

Cross-view image geo-localization aims to determine the locations of street-view query images by matching with GPS-tagged reference images from aerial view. Recent works have achieved surprisingly high retrieval accuracy on city-scale datasets. However, these results rely on the assumption that there exists a reference image exactly centered at the location of any query image, which is not applicable for practical scenarios. In this paper, we redefine this problem with a more realistic assumption that the query image can be arbitrary in the area of interest and the reference images are captured before the queries emerge. This assumption breaks the one-to-one retrieval setting of existing datasets as the queries and reference images are not perfectly aligned pairs, and there may be multiple reference images covering one query location. To bridge the gap between this realistic setting and existing datasets, we propose a new large-scale benchmark -- VIGOR -- for cross-View Image Geo-localization beyond One-to-one Retrieval. We benchmark existing state-of-the-art methods and propose a novel end-to-end framework to localize the query in a coarse-to-fine manner. Apart from the image-level retrieval accuracy, we also evaluate the localization accuracy in terms of the actual distance (meters) using the raw GPS data. Extensive experiments are conducted under different application scenarios to validate the effectiveness of the proposed method. The results indicate that cross-view geo-localization in this realistic setting is still challenging, fostering new research in this direction. Our dataset and code will be released at https://github.com/Jeff-Zilence/VIGOR

updated: Mon Mar 22 2021 04:01:54 GMT+0000 (UTC)

published: Tue Nov 24 2020 15:50:54 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト