Yes, we CANN: Constrained Approximate Nearest Neighbors for local feature-based visual localization

Dror Aiger; André Araujo; Simon Lynen

はい、できます。局所特徴ベースの視覚的位置特定のための制約付き近似最近傍法

大規模な視覚的位置特定システムは、モーションから構造を使用して画像コレクションから構築された 3D 点群に依存し続けています。これらのモデルの 3D 点はローカル画像特徴を使用して表現されますが、最近傍探索問題の規模により、クエリ画像のローカル特徴を点群と直接照合することは困難です。したがって、視覚的位置特定に対する最近のアプローチの多くは、最初にグローバル (画像ごと) 埋め込みを使用してデータベース画像の小さなサブセットを取得し、クエリの局所的な特徴がそれらに対してのみ照合されるハイブリッド方法を提案しています。クエリ画像ごとに 2 つの特徴タイプを計算する必要があるという重大な欠点にもかかわらず、視覚的位置特定における画像検索にはグローバルエンベディングが重要であるという考えが一般的になっているようです。この論文では、この仮定から一歩下がって、局所特徴のみを使用してジオメトリと外観空間の両方にわたる k 最近傍の統合ソリューションである制約付き近似最近傍 (CANN) を提案します。まず、複数のメトリクスにわたる k 近傍検索の理論的基盤を導出し、次に CANN が視覚的位置特定をどのように改善するかを紹介します。公開ローカリゼーションベンチマークに関する私たちの実験は、私たちの方法が最先端のグローバル特徴ベースの検索とローカル特徴集約スキームを使用したアプローチの両方を大幅に上回ることを示しています。さらに、インデックス時間とクエリ時間の両方において、これらのデータセットの特徴集約スキームよりも 1 桁高速です。コードが公開されます。

Large-scale visual localization systems continue to rely on 3D point clouds built from image collections using structure-from-motion. While the 3D points in these models are represented using local image features, directly matching a query image's local features against the point cloud is challenging due to the scale of the nearest-neighbor search problem. Many recent approaches to visual localization have thus proposed a hybrid method, where first a global (per image) embedding is used to retrieve a small subset of database images, and local features of the query are matched only against those. It seems to have become common belief that global embeddings are critical for said image-retrieval in visual localization, despite the significant downside of having to compute two feature types for each query image. In this paper, we take a step back from this assumption and propose Constrained Approximate Nearest Neighbors (CANN), a joint solution of k-nearest-neighbors across both the geometry and appearance space using only local features. We first derive the theoretical foundation for k-nearest-neighbor retrieval across multiple metrics and then showcase how CANN improves visual localization. Our experiments on public localization benchmarks demonstrate that our method significantly outperforms both state-of-the-art global feature-based retrieval and approaches using local feature aggregation schemes. Moreover, it is an order of magnitude faster in both index and query time than feature aggregation schemes for these datasets. Code will be released.

updated: Thu Jun 15 2023 10:12:10 GMT+0000 (UTC)

published: Thu Jun 15 2023 10:12:10 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト