Learning Sequential Descriptors for Sequence-based Visual Place Recognition

Riccardo Mereu; Gabriele Trivigno; Gabriele Berton; Carlo Masone; Barbara Caputo

シーケンスベースの視覚的場所認識のためのシーケンシャル記述子の学習

ロボット工学では、視覚的場所認識は、既知の場所のマップ内のロボットの現在の位置の仮説を生成するためのビデオストリームを入力として受け取る継続的なプロセスです。このタスクには、実際のアプリケーション向けの堅牢でスケーラブルで効率的な手法が必要です。この作品は、個々の画像からの情報を融合するためのさまざまなメカニズムを強調し、シーケンシャル記述子を使用した技術の詳細な分類法を提案します。この分類は、これらのさまざまなアーキテクチャの選択の長所と短所に関する証拠を提供する実験結果の完全なベンチマークによってサポートされています。既存のシーケンシャル記述子メソッドと比較して、CNNバックボーンの代わりにトランスフォーマーの実行可能性をさらに調査し、さまざまなデータセットで以前の最先端技術を上回るSeqVLADと呼ばれる新しいアドホックシーケンスレベルアグリゲーターを提案します。コードはhttps://github.com/vandal-vpr/vg-transformersで入手できます。

In robotics, Visual Place Recognition is a continuous process that receives as input a video stream to produce a hypothesis of the robot's current position within a map of known places. This task requires robust, scalable, and efficient techniques for real applications. This work proposes a detailed taxonomy of techniques using sequential descriptors, highlighting different mechanism to fuse the information from the individual images. This categorization is supported by a complete benchmark of experimental results that provides evidence on the strengths and weaknesses of these different architectural choices. In comparison to existing sequential descriptors methods, we further investigate the viability of Transformers instead of CNN backbones, and we propose a new ad-hoc sequence-level aggregator called SeqVLAD, which outperforms prior state of the art on different datasets. The code is available at https://github.com/vandal-vpr/vg-transformers.

updated: Fri Jul 08 2022 12:52:04 GMT+0000 (UTC)

published: Fri Jul 08 2022 12:52:04 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト