Kimera: from SLAM to Spatial Perception with 3D Dynamic Scene Graphs

Antoni Rosinol; Andrew Violette; Marcus Abate; Nathan Hughes; Yun Chang; Jingnan Shi; Arjun Gupta; Luca Carlone

キメラ：SLAMから3Dダイナミックシーングラフによる空間知覚まで

人間は、移動する環境の複雑なメンタルモデルを形成できます。このメンタルモデルは、シーンの幾何学的および意味論的側面をキャプチャし、静的および動的を含む複数の抽象レベル（オブジェクト、部屋、建物など）で環境を記述します。エンティティとその関係（たとえば、特定の時間に人が部屋にいる）。対照的に、現在のロボットの内部表現は、幾何学的プリミティブ（点、線、平面、ボクセルなど）のまばらなセットまたは密なセットの形式で、またはオブジェクトのコレクションとして、環境の部分的かつ断片化された理解を提供します。このホワイトペーパーでは、動的環境のメトリックとセマンティックの側面をシームレスにキャプチャする新しい表現である3Dダイナミックシーングラフ（DSG）を導入することにより、ロボットと人間の知覚のギャップを減らすことを試みます。 DSGは、ノードがさまざまな抽象化レベルでの空間概念を表し、エッジがノード間の時空間関係を表す階層化されたグラフです。 2番目の貢献は、視覚慣性データからDSGを構築する最初の完全自動方式であるKimeraです。キメラには、視覚的慣性SLAM、メトリックセマンティック3D再構成、オブジェクトのローカリゼーション、人間のポーズと形状の推定、およびシーンの解析のための最先端の技術が含まれています。 3番目の貢献は、実際のデータセットとフォトリアリスティックシミュレーションでのキメラの包括的な評価です。これには、混雑した屋内と屋外のシーンのコレクションをシミュレートする、新しくリリースされたデータセットuHumans2が含まれます。私たちの評価は、キメラが視覚慣性SLAMで最先端のパフォーマンスを達成し、正確な3Dメトリックセマンティックメッシュモデルをリアルタイムで推定し、数十のオブジェクトと人間がいる複雑な屋内環境のDSGを構築することを示しています。分。最後の貢献は、DSGを使用してリアルタイムの階層的セマンティックパスプランニングを行う方法を示しています。キメラのコアモジュールはオープンソースです。

Humans are able to form a complex mental model of the environment they move in. This mental model captures geometric and semantic aspects of the scene, describes the environment at multiple levels of abstractions (e.g., objects, rooms, buildings), includes static and dynamic entities and their relations (e.g., a person is in a room at a given time). In contrast, current robots' internal representations still provide a partial and fragmented understanding of the environment, either in the form of a sparse or dense set of geometric primitives (e.g., points, lines, planes, voxels) or as a collection of objects. This paper attempts to reduce the gap between robot and human perception by introducing a novel representation, a 3D Dynamic Scene Graph(DSG), that seamlessly captures metric and semantic aspects of a dynamic environment. A DSG is a layered graph where nodes represent spatial concepts at different levels of abstraction, and edges represent spatio-temporal relations among nodes. Our second contribution is Kimera, the first fully automatic method to build a DSG from visual-inertial data. Kimera includes state-of-the-art techniques for visual-inertial SLAM, metric-semantic 3D reconstruction, object localization, human pose and shape estimation, and scene parsing. Our third contribution is a comprehensive evaluation of Kimera in real-life datasets and photo-realistic simulations, including a newly released dataset, uHumans2, which simulates a collection of crowded indoor and outdoor scenes. Our evaluation shows that Kimera achieves state-of-the-art performance in visual-inertial SLAM, estimates an accurate 3D metric-semantic mesh model in real-time, and builds a DSG of a complex indoor environment with tens of objects and humans in minutes. Our final contribution shows how to use a DSG for real-time hierarchical semantic path-planning. The core modules in Kimera are open-source.

updated: Wed Oct 20 2021 18:52:36 GMT+0000 (UTC)

published: Mon Jan 18 2021 06:17:52 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト