Sequential Place Learning: Heuristic-Free High-Performance Long-Term Place Recognition

Marvin Chancán; Michael Milford

シーケンシャルプレイスラーニング：ヒューリスティックフリーの高性能長期プレイス認識

手作りのヒューリスティックを使用したシーケンシャルマッチングは、ほぼ10年間、ペアワイズ類似性の結果を強化するためのルートベースの場所認識の標準的な手法です。ただし、これらのアルゴリズムの適合率再現率のパフォーマンスは、短い時間ウィンドウ（TW）の長さで検索すると劇的に低下しますが、自律ナビゲーション研究用の大規模なロボットデータセットでは高い計算コストとストレージコストが必要になります。ここでは、視覚がなくても時空間スケールを堅牢にナビゲートする生物学的システムの影響を受けて、シーケンシャルプロセスを介して視覚的および位置的表現の共同学習手法を開発し、視点のために時間の逆伝播を介してトレーニング可能な学習ベースのCNN + LSTMアーキテクチャを設計します-および外観不変の場所認識。私たちのアプローチであるSequentialPlace Learning（SPL）は、単一のトラバーサルから環境を視覚的にエンコードするCNN関数に基づいているため、ストレージ容量が削減されます。一方、LSTMは、各視覚的埋め込みを対応する位置データと一時的に融合します。動き推定-直接順次推論用。従来の2段階パイプライン、たとえば、match-then-temporally-filterとは異なり、私たちのネットワークは、短いTWを使用しても、単一の単眼画像シーケンスからシーケンスマッチングを共同で学習しながら、偽陽性率を直接排除します。したがって、4つの挑戦的なベンチマークデータセットに新しい最先端のパフォーマンス基準を設定しながら、モデルが15の従来の方法よりも優れていることを示します。そのうちの1つは、100％の精度で100％のリコール率で解決され、正しく一致していると見なすことができます。極端な日光の下のすべての場所-暗闇が変化します。さらに、SPLは、35,768の連続するフレームで構成される729 kmのルートで、従来の方法よりも最大70倍高速に展開できることを示しています。広範な実験により、...ベースラインコードはhttps://github.com/mchancan/deepseqslamで入手できます。

Sequential matching using hand-crafted heuristics has been standard practice in route-based place recognition for enhancing pairwise similarity results for nearly a decade. However, precision-recall performance of these algorithms dramatically degrades when searching on short temporal window (TW) lengths, while demanding high compute and storage costs on large robotic datasets for autonomous navigation research. Here, influenced by biological systems that robustly navigate spacetime scales even without vision, we develop a joint visual and positional representation learning technique, via a sequential process, and design a learning-based CNN+LSTM architecture, trainable via backpropagation through time, for viewpoint- and appearance-invariant place recognition. Our approach, Sequential Place Learning (SPL), is based on a CNN function that visually encodes an environment from a single traversal, thus reducing storage capacity, while an LSTM temporally fuses each visual embedding with corresponding positional data -- obtained from any source of motion estimation -- for direct sequential inference. Contrary to classical two-stage pipelines, e.g., match-then-temporally-filter, our network directly eliminates false-positive rates while jointly learning sequence matching from a single monocular image sequence, even using short TWs. Hence, we demonstrate that our model outperforms 15 classical methods while setting new state-of-the-art performance standards on 4 challenging benchmark datasets, where one of them can be considered solved with recall rates of 100% at 100% precision, correctly matching all places under extreme sunlight-darkness changes. In addition, we show that SPL can be up to 70x faster to deploy than classical methods on a 729 km route comprising 35,768 consecutive frames. Extensive experiments demonstrate the... Baseline code available at https://github.com/mchancan/deepseqslam

updated: Tue Mar 02 2021 22:57:43 GMT+0000 (UTC)

published: Tue Mar 02 2021 22:57:43 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト