CVTNet: A Cross-View Transformer Network for Place Recognition Using LiDAR Data

Junyi Ma; Guangming Xiong; Jingyi Xu; Xieyuanli Chen

CVTNet: LiDAR データを使用した場所認識のためのクロスビュートランスフォーマーネットワーク

LiDAR ベースの場所認識 (LPR) は、GPS が拒否された環境で以前に訪れた場所を識別するための自動運転車の最も重要なコンポーネントの 1 つです。ほとんどの既存の LPR メソッドは、さまざまなビューを考慮せずに入力点群のありふれた表現を使用するため、LiDAR センサーからの情報を十分に活用できない可能性があります。この論文では、LiDAR データから生成された距離画像ビュー (RIV) と鳥瞰図 (BEV) を融合するために、CVTNet と呼ばれるクロスビュートランスフォーマーベースのネットワークを提案します。イントラトランスフォーマーを使用してビュー自体の相関関係を抽出し、インタートランスフォーマーを使用して 2 つの異なるビュー間の相関関係を抽出します。それに基づいて、提案された CVTNet は、各レーザースキャンのエンドツーエンドのオンラインでヨー角度不変のグローバル記述子を生成し、現在のクエリスキャンと事前に構築されたデータベースとの間で一致する記述子によって、以前に見た場所を取得します。異なるセンサー設定と環境条件で収集された 3 つのデータセットに対するアプローチを評価します。実験結果は、私たちの方法が、視点の変化と長い時間スパンに対する強いロバスト性を備えた最先端の LPR 方法よりも優れていることを示しています。さらに、私たちのアプローチは、典型的な LiDAR フレームレートよりも高速に実行できる優れたリアルタイムパフォーマンスを備えています。メソッドの実装は、https://github.com/BIT-MJY/CVTNet でオープンソースとしてリリースされています。

LiDAR-based place recognition (LPR) is one of the most crucial components of autonomous vehicles to identify previously visited places in GPS-denied environments. Most existing LPR methods use mundane representations of the input point cloud without considering different views, which may not fully exploit the information from LiDAR sensors. In this paper, we propose a cross-view transformer-based network, dubbed CVTNet, to fuse the range image views (RIVs) and bird's eye views (BEVs) generated from the LiDAR data. It extracts correlations within the views themselves using intra-transformers and between the two different views using inter-transformers. Based on that, our proposed CVTNet generates a yaw-angle-invariant global descriptor for each laser scan end-to-end online and retrieves previously seen places by descriptor matching between the current query scan and the pre-built database. We evaluate our approach on three datasets collected with different sensor setups and environmental conditions. The experimental results show that our method outperforms the state-of-the-art LPR methods with strong robustness to viewpoint changes and long-time spans. Furthermore, our approach has a good real-time performance that can run faster than the typical LiDAR frame rate. The implementation of our method is released as open source at: https://github.com/BIT-MJY/CVTNet.

updated: Fri Feb 03 2023 11:37:20 GMT+0000 (UTC)

published: Fri Feb 03 2023 11:37:20 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト