Ambiguity-Aware Multi-Object Pose Optimization for Visually-Assisted Robot Manipulation

Myung-Hwan Jeon; Jeongyun Kim; Jee-Hwan Ryu; Ayoung Kim

視覚支援ロボット操作のための曖昧性を考慮した複数オブジェクトの姿勢最適化

6D オブジェクトの姿勢推定は、単一の画像または複数の画像を使用して、オブジェクトとカメラの間の相対的な姿勢を推測することを目的としています。ほとんどの研究は、オクルージョンや構造的なあいまいさ (対称性) の下で関連する不確実性を伴わずにオブジェクトの姿勢を予測することに重点を置いてきました。しかし、これらの作業には形状属性に関する事前情報が必要であり、この条件は実際にはほとんど満たされていません。非対称のオブジェクトでも、視点の変更によって対称になる場合があります。さらに、多様なセンサーデータを取得して融合することは、それらをロボティクスアプリケーションに拡張する場合に困難です。これらの制限に取り組み、一般的な不確実性予測方法として、あいまいさを認識した 6D オブジェクト姿勢推定ネットワーク PrimA6D++ を提示します。オクルージョンや対称性などのポーズ推定の主な課題は、測定された予測のあいまいさに基づいて一般的な方法で処理できます。具体的には、ネットワークを考案して、ターゲットオブジェクトの 3 つの回転軸プリミティブイメージを再構築し、各プリミティブ軸に沿った潜在的な不確実性を予測します。推定された不確実性を活用して、オブジェクト SLAM 問題として扱うことにより、視覚測定とカメラポーズを使用してマルチオブジェクトポーズを最適化します。提案された方法は、T-LESS および YCB-Video データセットで大幅なパフォーマンスの向上を示しています。さらに、視覚支援ロボット操作のためのリアルタイムシーン認識機能を実証します。私たちのコードと補足資料は、https://github.com/rpmsnu/PrimA6D で入手できます。

6D object pose estimation aims to infer the relative pose between the object and the camera using a single image or multiple images. Most works have focused on predicting the object pose without associated uncertainty under occlusion and structural ambiguity (symmetricity). However, these works demand prior information about shape attributes, and this condition is hardly satisfied in reality; even asymmetric objects may be symmetric under the viewpoint change. In addition, acquiring and fusing diverse sensor data is challenging when extending them to robotics applications. Tackling these limitations, we present an ambiguity-aware 6D object pose estimation network, PrimA6D++, as a generic uncertainty prediction method. The major challenges in pose estimation, such as occlusion and symmetry, can be handled in a generic manner based on the measured ambiguity of the prediction. Specifically, we devise a network to reconstruct the three rotation axis primitive images of a target object and predict the underlying uncertainty along each primitive axis. Leveraging the estimated uncertainty, we then optimize multi-object poses using visual measurements and camera poses by treating it as an object SLAM problem. The proposed method shows a significant performance improvement in T-LESS and YCB-Video datasets. We further demonstrate real-time scene recognition capability for visually-assisted robot manipulation. Our code and supplementary materials are available at https://github.com/rpmsnu/PrimA6D.

updated: Wed Nov 02 2022 08:57:20 GMT+0000 (UTC)

published: Wed Nov 02 2022 08:57:20 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト