Pix2Map: Cross-modal Retrieval for Inferring Street Maps from Images

Xindi Wu; KwunFung Lau; Francesco Ferroni; Aljoša Ošep; Deva Ramanan

Pix2Map: 画像からストリートマップを推測するためのクロスモーダル検索

自動運転車は、自律ナビゲーションを都市の道路地図に依存しています。このホワイトペーパーでは、既存のマップを継続的に更新および拡張する必要に応じて、エゴビュー画像から直接都市のストリートマップトポロジを推測する方法である Pix2Map を紹介します。生の画像データから複雑な都市道路のトポロジを直接推測する必要があるため、これは困難な作業です。この論文の主な洞察は、この問題は、画像と既存のマップの共同のクロスモーダル埋め込み空間を学習することにより、クロスモーダル検索として提起できるということです。これは、視覚的環境のトポロジーレイアウトをエンコードする離散グラフとして表されます。 Argoverse データセットを使用して実験的評価を行い、画像データのみから、見える道路と見えない道路の両方に対応するストリートマップを正確に取得できることを示します。さらに、取得したマップを使用して既存のマップを更新または拡張し、視覚的な位置特定と空間グラフからの画像検索の概念実証の結果を表示することもできます。

Self-driving vehicles rely on urban street maps for autonomous navigation. In this paper, we introduce Pix2Map, a method for inferring urban street map topology directly from ego-view images, as needed to continually update and expand existing maps. This is a challenging task, as we need to infer a complex urban road topology directly from raw image data. The main insight of this paper is that this problem can be posed as cross-modal retrieval by learning a joint, cross-modal embedding space for images and existing maps, represented as discrete graphs that encode the topological layout of the visual surroundings. We conduct our experimental evaluation using the Argoverse dataset and show that it is indeed possible to accurately retrieve street maps corresponding to both seen and unseen roads solely from image data. Moreover, we show that our retrieved maps can be used to update or expand existing maps and even show proof-of-concept results for visual localization and image retrieval from spatial graphs.

updated: Tue Jan 10 2023 22:05:35 GMT+0000 (UTC)

published: Tue Jan 10 2023 22:05:35 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト