MixVPR: Feature Mixing for Visual Place Recognition

Amar Ali-bey; Brahim Chaib-draa; Philippe Giguère

MixVPR: 視覚的場所認識のための特徴ミキシング

視覚的場所認識 (VPR) は、モバイルロボティクスや自動運転、その他のコンピュータービジョンタスクの重要な部分です。これは、コンピュータービジョンのみを使用して、クエリ画像に描かれている場所を特定するプロセスを指します。大規模な繰り返しの構造物、天候、照明の変化は、時間の経過とともに外観が大幅に変化する可能性があるため、実際の課題となります。これらの課題への取り組みに加えて、効率的な VPR 手法は、遅延が問題となる現実世界のシナリオでも実用的でなければなりません。これに対処するために、MixVPR を導入します。これは、事前にトレーニングされたバックボーンから特徴マップをグローバルな特徴のセットとして取得する新しい全体的な特徴集約手法です。次に、NetVLAD や TransVPR で行われるようなローカルまたはピラミッド集約の必要性を排除し、カスケード機能混合の各機能マップ内の要素間のグローバルな関係を組み込みます。複数の大規模ベンチマークでの広範な実験を通じて、この手法の有効性を実証します。私たちの方法は、CosPlace や NetVLAD と比較してパラメーターの数が半分以下でありながら、既存のすべての手法よりも大幅に優れています。 Pitts250k-test で 94.6%、MapillarySLS で 88.0%、さらに重要なことに Nordland で 58.4% という史上最高のリコール@1 スコアを達成しました。最後に、私たちの方法は、Patch-NetVLAD、TransVPR、SuperGLUE などの 2 段階の検索手法よりも優れていますが、桁違いに高速です。コードとトレーニング済みモデルは、https://github.com/amaralibey/MixVPR で入手できます。

Visual Place Recognition (VPR) is a crucial part of mobile robotics and autonomous driving as well as other computer vision tasks. It refers to the process of identifying a place depicted in a query image using only computer vision. At large scale, repetitive structures, weather and illumination changes pose a real challenge, as appearances can drastically change over time. Along with tackling these challenges, an efficient VPR technique must also be practical in real-world scenarios where latency matters. To address this, we introduce MixVPR, a new holistic feature aggregation technique that takes feature maps from pre-trained backbones as a set of global features. Then, it incorporates a global relationship between elements in each feature map in a cascade of feature mixing, eliminating the need for local or pyramidal aggregation as done in NetVLAD or TransVPR. We demonstrate the effectiveness of our technique through extensive experiments on multiple large-scale benchmarks. Our method outperforms all existing techniques by a large margin while having less than half the number of parameters compared to CosPlace and NetVLAD. We achieve a new all-time high recall@1 score of 94.6% on Pitts250k-test, 88.0% on MapillarySLS, and more importantly, 58.4% on Nordland. Finally, our method outperforms two-stage retrieval techniques such as Patch-NetVLAD, TransVPR and SuperGLUE all while being orders of magnitude faster. Our code and trained models are available at https://github.com/amaralibey/MixVPR.

updated: Fri Mar 03 2023 19:24:03 GMT+0000 (UTC)

published: Fri Mar 03 2023 19:24:03 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト