Hierarchical Graph Neural Networks for Proprioceptive 6D Pose Estimation of In-hand Objects

Alireza Rezazadeh; Snehal Dikhale; Soshi Iba; Nawid Jamali

手持ちオブジェクトの固有受容 6D 姿勢推定のための階層グラフニューラルネットワーク

ロボット操作、特に手持ちのオブジェクトの操作では、多くの場合、オブジェクトの 6D 姿勢の正確な推定が必要です。推定姿勢の精度を向上させるために、6D 物体姿勢推定における最先端のアプローチでは、RGB 画像、深度、触覚読み取りなどの 1 つ以上のモダリティからの観察データが使用されます。しかし、既存のアプローチでは、これらのモダリティによって捕捉されたオブジェクトの基礎となる幾何学的構造の利用が限定的であるため、視覚的特徴への依存度が高まっています。そのため、そのような視覚的特徴が欠けているオブジェクトが表示されたり、視覚的特徴が単に遮られている場合には、パフォーマンスが低下します。さらに、現在のアプローチは、指の位置に埋め込まれた固有受容情報を利用していません。これらの制限に対処するために、この論文では、(1) 幾何学的な情報に基づいた 6D オブジェクトの姿勢推定を可能にするマルチモーダル (視覚と触覚) データを組み合わせるための階層グラフニューラルネットワークアーキテクチャを導入します。(2) 階層的なメッセージパッシング操作を導入します。これは、モダリティ内およびモダリティ間で情報を流し、グラフベースのオブジェクト表現を学習します。(3) 手のオブジェクト表現のための固有受容情報を考慮する方法を導入します。 YCB オブジェクトとモデルセットのオブジェクトの多様なサブセットでモデルを評価し、精度とオクルージョンに対する堅牢性の点で、私たちの方法が既存の最先端の研究を大幅に上回ることを示します。また、提案したフレームワークを実際のロボットに展開し、実際の設定への移行が成功することを定性的に実証します。

Robotic manipulation, in particular in-hand object manipulation, often requires an accurate estimate of the object's 6D pose. To improve the accuracy of the estimated pose, state-of-the-art approaches in 6D object pose estimation use observational data from one or more modalities, e.g., RGB images, depth, and tactile readings. However, existing approaches make limited use of the underlying geometric structure of the object captured by these modalities, thereby, increasing their reliance on visual features. This results in poor performance when presented with objects that lack such visual features or when visual features are simply occluded. Furthermore, current approaches do not take advantage of the proprioceptive information embedded in the position of the fingers. To address these limitations, in this paper: (1) we introduce a hierarchical graph neural network architecture for combining multimodal (vision and touch) data that allows for a geometrically informed 6D object pose estimation, (2) we introduce a hierarchical message passing operation that flows the information within and across modalities to learn a graph-based object representation, and (3) we introduce a method that accounts for the proprioceptive information for in-hand object representation. We evaluate our model on a diverse subset of objects from the YCB Object and Model Set, and show that our method substantially outperforms existing state-of-the-art work in accuracy and robustness to occlusion. We also deploy our proposed framework on a real robot and qualitatively demonstrate successful transfer to real settings.

updated: Wed Jun 28 2023 01:18:53 GMT+0000 (UTC)

published: Wed Jun 28 2023 01:18:53 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト