External Camera-based Mobile Robot Pose Estimation for Collaborative Perception with Smart Edge Sensors

Simon Bultmann; Raphael Memmesheimer; Sven Behnke

スマートエッジセンサーを使用した協調知覚のための外部カメラベースのモバイルロボット姿勢推定

マルチビュー RGB 画像を使用して、静的カメラのネットワークのアロセントリック座標から移動ロボットの姿勢を推定するアプローチを提示します。画像は、ディープニューラルネットワークによってスマートエッジセンサー上でローカルにオンラインで処理され、ロボットを検出し、3D ロボットモデルの特徴的な位置で定義された 2D キーポイントを推定します。ロボットのキーポイント検出は、再投影エラーのマルチビュー最小化によってロボットの姿勢が推定される中央バックエンドで同期および融合されます。外部カメラからの姿勢推定により、ロボットのローカリゼーションは、完全に未知の状態 (誘拐されたロボットの問題) からアロセントリックマップで初期化され、時間の経過とともに確実に追跡されます。ロボットの内部ナビゲーションスタックと比較して、カメラベースのポーズ推定の精度と堅牢性を評価する一連の実験を行い、カメラベースの方法が 3 cm および 1° 未満のポーズエラーを達成し、時間の経過とともにドリフトしないことを示します。ロボットはアロセントリックにローカライズされています。ロボットの姿勢が正確に推定されると、その観察結果をアロセントリックシーンモデルに融合できます。モバイルロボットと静的なスマートエッジセンサーからの観測が融合されて、約 240 m^2 の屋内環境の 3D セマンティックマップを共同で構築する実世界のアプリケーションを示します。

We present an approach for estimating a mobile robot's pose w.r.t. the allocentric coordinates of a network of static cameras using multi-view RGB images. The images are processed online, locally on smart edge sensors by deep neural networks to detect the robot and estimate 2D keypoints defined at distinctive positions of the 3D robot model. Robot keypoint detections are synchronized and fused on a central backend, where the robot's pose is estimated via multi-view minimization of reprojection errors. Through the pose estimation from external cameras, the robot's localization can be initialized in an allocentric map from a completely unknown state (kidnapped robot problem) and robustly tracked over time. We conduct a series of experiments evaluating the accuracy and robustness of the camera-based pose estimation compared to the robot's internal navigation stack, showing that our camera-based method achieves pose errors below 3 cm and 1° and does not drift over time, as the robot is localized allocentrically. With the robot's pose precisely estimated, its observations can be fused into the allocentric scene model. We show a real-world application, where observations from mobile robot and static smart edge sensors are fused to collaboratively build a 3D semantic map of a ∼240 m^2 indoor environment.

updated: Tue Mar 07 2023 11:03:33 GMT+0000 (UTC)

published: Tue Mar 07 2023 11:03:33 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト