MetaGraspNet_v0: A Large-Scale Benchmark Dataset for Vision-driven Robotic Grasping via Physics-based Metaverse Synthesis

Yuhao Chen; E. Zhixuan Zeng; Maximilian Gilles; Alexander Wong

MetaGraspNet_v0: 物理ベースのメタバース合成による視覚駆動型ロボット把持のための大規模ベンチマークデータセット

反復的で骨の折れる作業に取り組むために、ロボティクスシステムを搭載したスマートファクトリーへの関心が高まっています。ロボット工学を活用したスマートファクトリアプリケーションで影響力がありながらも困難なタスクの 1 つは、ロボットによる把持です。ロボットアームを使用して、さまざまな設定で自律的に物体を把持します。ロボットによる把持には、物体の検出、セグメンテーション、把持の予測、ピッキングの計画など、さまざまなコンピュータービジョンタスクが必要です。特にディープラーニングを使用して、ロボットによる把持のための機械学習の活用が大幅に進歩しましたが、大きな課題が残っています。さまざまなシナリオと順列をカバーする大規模で高品質の RGBD データセットが必要です。この大きくて多様なデータの問題に取り組むために、仮想世界と物理世界の間のギャップを大幅に埋めた最近のメタバースの概念に触発されています。メタバースを使用すると、現実世界の製造シナリオのデジタルツインを作成したり、モデルのトレーニング用に大量のデータを生成できるさまざまなシナリオを仮想的に作成したりできます。このホワイトペーパーでは、MetaGraspNet を紹介します。これは、物理ベースのメタバース合成によるビジョン主導のロボット把持のための大規模なベンチマークデータセットです。提案されたデータセットには、100,000 枚の画像と 25 の異なるオブジェクトタイプが含まれており、さまざまな把握シナリオでのオブジェクト検出とセグメンテーションモデルのパフォーマンスを評価するために 5 つの困難に分割されています。また、既存の汎用パフォーマンスメトリックと比較して、ロボットによる把持アプリケーションにより適した方法で、オブジェクトの検出とセグメンテーションのパフォーマンスを評価するために、データセットと共に新しいレイアウト加重パフォーマンスメトリックを提案します。当社のベンチマークデータセットは、Kaggle でオープンソースで利用できます。最初のフェーズは、詳細なオブジェクト検出、セグメンテーション、レイアウトアノテーション、およびレイアウトに重み付けされたパフォーマンスメトリックスクリプトで構成されます。

There has been increasing interest in smart factories powered by robotics systems to tackle repetitive, laborious tasks. One impactful yet challenging task in robotics-powered smart factory applications is robotic grasping: using robotic arms to grasp objects autonomously in different settings. Robotic grasping requires a variety of computer vision tasks such as object detection, segmentation, grasp prediction, pick planning, etc. While significant progress has been made in leveraging of machine learning for robotic grasping, particularly with deep learning, a big challenge remains in the need for large-scale, high-quality RGBD datasets that cover a wide diversity of scenarios and permutations. To tackle this big, diverse data problem, we are inspired by the recent rise in the concept of metaverse, which has greatly closed the gap between virtual worlds and the physical world. Metaverses allow us to create digital twins of real-world manufacturing scenarios and to virtually create different scenarios from which large volumes of data can be generated for training models. In this paper, we present MetaGraspNet: a large-scale benchmark dataset for vision-driven robotic grasping via physics-based metaverse synthesis. The proposed dataset contains 100,000 images and 25 different object types and is split into 5 difficulties to evaluate object detection and segmentation model performance in different grasping scenarios. We also propose a new layout-weighted performance metric alongside the dataset for evaluating object detection and segmentation performance in a manner that is more appropriate for robotic grasp applications compared to existing general-purpose performance metrics. Our benchmark dataset is available open-source on Kaggle, with the first phase consisting of detailed object detection, segmentation, layout annotations, and a layout-weighted performance metric script.

updated: Tue Aug 30 2022 17:53:40 GMT+0000 (UTC)

published: Wed Dec 29 2021 17:23:24 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト