TwistSLAM++: Fusing multiple modalities for accurate dynamic semantic SLAM

Mathieu Gonzalez; Eric Marchand; Amine Kacete; Jérôme Royan

TwistSLAM++: 正確な動的セマンティック SLAM のための複数のモダリティの融合

ほとんどの従来の SLAM システムは、静的なシーンの仮定に依存しているため、現実世界のシナリオでの適用性が制限されます。最近の SLAM フレームワークは、カメラと移動オブジェクトを同時に追跡するために提案されています。ただし、オブジェクトの正規の姿勢を推定することができないことが多く、オブジェクト追跡の精度が低くなります。この問題を解決するために、ステレオ画像と LiDAR 情報を融合するセマンティックで動的な SLAM システムである TwistSLAM++ を提案します。セマンティック情報を使用して、動いている可能性のあるオブジェクトを追跡し、それらを LiDAR スキャンの 3D オブジェクト検出に関連付けて、ポーズとサイズを取得します。次に、連続したオブジェクトスキャンでレジストレーションを実行して、オブジェクトの姿勢推定を改善します。最後に、オブジェクトスキャンを使用してオブジェクトの形状を推定し、マップポイントを BA 内の推定されたサーフェス上に配置します。マルチモーダル情報に基づくこの融合アプローチがオブジェクト追跡の精度を向上させることを、古典的なベンチマークで示します。

Most classical SLAM systems rely on the static scene assumption, which limits their applicability in real world scenarios. Recent SLAM frameworks have been proposed to simultaneously track the camera and moving objects. However they are often unable to estimate the canonical pose of the objects and exhibit a low object tracking accuracy. To solve this problem we propose TwistSLAM++, a semantic, dynamic, SLAM system that fuses stereo images and LiDAR information. Using semantic information, we track potentially moving objects and associate them to 3D object detections in LiDAR scans to obtain their pose and size. Then, we perform registration on consecutive object scans to refine object pose estimation. Finally, object scans are used to estimate the shape of the object and constrain map points to lie on the estimated surface within the BA. We show on classical benchmarks that this fusion approach based on multimodal information improves the accuracy of object tracking.

updated: Wed Mar 22 2023 20:20:48 GMT+0000 (UTC)

published: Fri Sep 16 2022 12:28:21 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト