Neural Implicit Dense Semantic SLAM

Yasaman Haghighi; Suryansh Kumar; Jean Philippe Thiran; Luc Van Gool

ニューラルインプリシットデンスセマンティック SLAM

このペーパーでは、ニューラルの暗黙的なシーン表現の利点を活用して、屋内シーンのよく知られたセマンティック Visual Simultaneous Localization and Mapping (V-SLAM) 問題を解決するための効率的なオンラインフレームワークを紹介します。 NICE-SLAM などの同様のラインの既存の方法には、このような重要な屋内シーンの理解の問題に使用するには、いくつかの重要な実際的な制限があります。この目的のために、入力として RGB-D フレームを想定する既存の方法とは対照的に、最新のセマンティック V-SLAM の次の命題について争っています。 . (ii) ニューラルフィールドを使用して、SDF、セマンティクス、RGB、深度の高密度で多面的なシーン表現が効率的にメモリに提供されます。 (iii) すべてのフレームを使用するのではなく、キーフレームのセットが優れたシーン表現を学習するのに十分であることを示し、それによってパイプラインのトレーニング時間を改善します。 (iv) 複数のローカルマッピングネットワークを使用して、大規模なシーンのパイプラインを拡張できます。いくつかの一般的なベンチマークデータセットでの広範な実験を通じて、ノイズが多く非常にまばらな深度測定であっても、テスト時に正確な追跡、マッピング、セマンティックラベル付けが提供されることを示しています。このペーパーの後半で、パイプラインを RGB 画像入力に簡単に拡張できることを示します。全体として、提案されたパイプラインは、さまざまなロボットの視覚と関連する問題を支援できる重要なシーン理解タスクに有利なソリューションを提供します。

This paper presents an efficient online framework to solve the well-known semantic Visual Simultaneous Localization and Mapping (V-SLAM) problem for indoor scenes leveraging the advantages of neural implicit scene representation. Existing methods on similar lines, such as NICE-SLAM, has some critical practical limitations to put to use for such an important indoor scene understanding problem. To this end, we contend for the following proposition for modern semantic V-SLAM contrary to existing methods assuming RGB-D frames as input (i) For a rigid scene, robust and accurate camera motion could be computed with disentangled tracking and 3D mapping pipeline. (ii) Using neural fields, a dense and multifaceted scene representation of SDF, semantics, RGB, and depth is provided memory efficiently. (iii) Rather than using every frame, we demonstrate that the set of keyframes is sufficient to learn excellent scene representation, thereby improving the pipeline's train time. (iv) Multiple local mapping networks could be used to extend the pipeline for large-scale scenes. We show via extensive experiments on several popular benchmark datasets that our approach offers accurate tracking, mapping, and semantic labeling at test time even with noisy and highly sparse depth measurements. Later in the paper, we show that our pipeline can easily extend to RGB image input. Overall, the proposed pipeline offers a favorable solution to an important scene understanding task that can assist in diverse robot visual perception and related problems.

updated: Thu Apr 27 2023 23:03:52 GMT+0000 (UTC)

published: Thu Apr 27 2023 23:03:52 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト