NeuMap: Neural Coordinate Mapping by Auto-Transdecoder for Camera Localization

Shitao Tang; Sicong Tang; Andrea Tagliasacchi; Ping Tan; Yasutaka Furukawa

NeuMap: カメラ位置特定のための自動トランスデコーダーによるニューラル座標マッピング

このホワイトペーパーでは、カメラ位置特定のためのエンドツーエンドのニューラルマッピング手法を紹介し、シーン全体を潜在コードのグリッドにエンコードします。これにより、Transformer ベースの自動デコーダーがクエリピクセルの 3D 座標を回帰します。最先端のカメラローカリゼーション方法では、各シーンをポイントごとの機能を備えた 3D ポイントクラウドとして保存する必要があり、シーンごとに数ギガバイトのストレージが必要です。圧縮は可能ですが、圧縮率が高いとパフォーマンスが大幅に低下します。 NeuMap は、1) 学習可能な潜在コードを使用してシーン情報を保存し、2) シーンに依存しない Transformer ベースの自動デコーダーを使用してクエリピクセルの座標を推測することにより、パフォーマンスの低下を最小限に抑えながら非常に高い圧縮率を実現します。シーンにとらわれないネットワーク設計は、大規模なデータでトレーニングすることによってロバストなマッチングの事前確率を学習し、さらに、ネットワークの重みを修正しながら、新しいシーン用にコードをすばやく最適化することを可能にします。 5 つのベンチマークを使用した広範な評価では、NeuMap が他のすべての座標回帰手法よりも大幅に優れており、フィーチャマッチング手法と同様のパフォーマンスを達成しながら、シーンの表現サイズがはるかに小さいことが示されています。たとえば、NeuMap はわずか 6MB のデータでアーヘンの夜のベンチマークで 39.1% の精度を達成しますが、他の強力な方法は 100MB または数ギガバイトを必要とし、高圧縮設定では完全に失敗します。コードは https://github.com/Tangshitao/NeuMap で入手できます。

This paper presents an end-to-end neural mapping method for camera localization, encoding a whole scene into a grid of latent codes, with which a Transformer-based auto-decoder regresses 3D coordinates of query pixels. State-of-the-art camera localization methods require each scene to be stored as a 3D point cloud with per-point features, which takes several gigabytes of storage per scene. While compression is possible, the performance drops significantly at high compression rates. NeuMap achieves extremely high compression rates with minimal performance drop by using 1) learnable latent codes to store scene information and 2) a scene-agnostic Transformer-based auto-decoder to infer coordinates for a query pixel. The scene-agnostic network design also learns robust matching priors by training with large-scale data, and further allows us to just optimize the codes quickly for a new scene while fixing the network weights. Extensive evaluations with five benchmarks show that NeuMap outperforms all the other coordinate regression methods significantly and reaches similar performance as the feature matching methods while having a much smaller scene representation size. For example, NeuMap achieves 39.1% accuracy in Aachen night benchmark with only 6MB of data, while other compelling methods require 100MB or a few gigabytes and fail completely under high compression settings. The codes are available at https://github.com/Tangshitao/NeuMap.

updated: Mon Nov 21 2022 04:46:22 GMT+0000 (UTC)

published: Mon Nov 21 2022 04:46:22 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト