Artifacts Mapping: Multi-Modal Semantic Mapping for Object Detection and 3D Localization

Federico Rollo; Gennaro Raiola; Andrea Zunino; Nikolaos Tsagarakis; Arash Ajoudani

アーティファクトマッピング: オブジェクト検出と 3D ローカリゼーションのためのマルチモーダルセマンティックマッピング

幾何学的ナビゲーションは現在、ロボット工学の十分に確立された分野であり、研究の焦点はセマンティックマッピングなどのより高いレベルのシーンの理解に移りつつあります。ロボットが環境と対話する必要がある場合、ロボットは周囲のコンテキスト情報を理解できなければなりません。この作業は、建設中 (SLAM) またはすでに構築されているマップ内のオブジェクトの分類と位置特定に焦点を当てています。この方向性をさらに探求するために、マルチモーダルセンサーフュージョンアプローチ (RGB-D カメラと LIDAR からの RGB データと深度データを組み合わせる) を使用して、既知の環境内で事前定義されたオブジェクトを自律的に検出して位置特定できるフレームワークを提案します。このフレームワークは、RGB データによる環境の理解、マルチモーダルセンサーフュージョンによる深度の推定、アーティファクトの管理 (つまり、測定のフィルタリングと安定化) という 3 つの重要な要素で構成されます。実験では、提案されたフレームワークが後処理なしで実際のサンプル環境でオブジェクトの 98% を正確に検出できる一方で、オブジェクトの 85% と 80% がそれぞれ単一の RGBD カメラまたは RGB + LIDAR セットアップを使用してマッピングされたことを示しています。単一センサー (カメラまたは LIDAR) 実験との比較は、センサーフュージョンにより、ロボットが純粋に視覚的またはレーザーベースのアプローチではノイズが多かったり不正確だったりする近くの障害物と遠くの障害物を正確に検出できることを示します。

Geometric navigation is nowadays a well-established field of robotics and the research focus is shifting towards higher-level scene understanding, such as Semantic Mapping. When a robot needs to interact with its environment, it must be able to comprehend the contextual information of its surroundings. This work focuses on classifying and localising objects within a map, which is under construction (SLAM) or already built. To further explore this direction, we propose a framework that can autonomously detect and localize predefined objects in a known environment using a multi-modal sensor fusion approach (combining RGB and depth data from an RGB-D camera and a lidar). The framework consists of three key elements: understanding the environment through RGB data, estimating depth through multi-modal sensor fusion, and managing artifacts (i.e., filtering and stabilizing measurements). The experiments show that the proposed framework can accurately detect 98% of the objects in the real sample environment, without post-processing, while 85% and 80% of the objects were mapped using the single RGBD camera or RGB + lidar setup respectively. The comparison with single-sensor (camera or lidar) experiments is performed to show that sensor fusion allows the robot to accurately detect near and far obstacles, which would have been noisy or imprecise in a purely visual or laser-based approach.

updated: Tue Nov 21 2023 21:04:24 GMT+0000 (UTC)

published: Mon Jul 03 2023 15:51:39 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト