ManiSkill: Generalizable Manipulation Skill Benchmark with Large-Scale Demonstrations

Tongzhou Mu; Zhan Ling; Fanbo Xiang; Derek Yang; Xuanlin Li; Stone Tao; Zhiao Huang; Zhiwei Jia; Hao Su

ManiSkill：大規模なデモンストレーションを備えた一般化可能な操作スキルベンチマーク

3D視覚入力からのオブジェクト操作は、一般化可能な知覚およびポリシーモデルの構築に多くの課題をもたらします。ただし、既存のベンチマークの3Dアセットには、トポロジとジオメトリの実際のクラス内の複雑さに対応する3D形状の多様性がほとんどありません。ここでは、SAPIEN Manipulation Skill Benchmark（ManiSkill）を提案して、完全な物理シミュレーターでさまざまなオブジェクトの操作スキルをベンチマークします。 ManiSkillの3Dアセットには、クラス内の大きな位相的および幾何学的なバリエーションが含まれます。タスクは、異なるタイプの操作の課題をカバーするように慎重に選択されています。 3Dビジョンの最新の進歩により、ベンチマークをカスタマイズして、3Dディープラーニングに取り組んでいる研究者を挑戦に誘う必要があると考えています。この目的のために、自我中心の点群またはRGB-D画像を返す移動パノラマカメラをシミュレートします。さらに、ManiSkillには、操作研究に関心のある幅広い研究者にサービスを提供してもらいたいと考えています。相互作用からのポリシーの学習をサポートするだけでなく、多数の高品質のデモンストレーション（最大36,000の成功した軌道、合計で最大150万の点群/ RGB-Dフレーム）を提供することにより、デモンストレーションからの学習（LfD）メソッドもサポートします。）。 3DディープラーニングとLfDアルゴリズムを使用してベースラインを提供します。ベンチマークのすべてのコード（シミュレーター、環境、SDK、およびベースライン）はオープンソースであり、学際的な研究者が直面する課題はベンチマークに基づいて開催されます。

Object manipulation from 3D visual inputs poses many challenges on building generalizable perception and policy models. However, 3D assets in existing benchmarks mostly lack the diversity of 3D shapes that align with real-world intra-class complexity in topology and geometry. Here we propose SAPIEN Manipulation Skill Benchmark (ManiSkill) to benchmark manipulation skills over diverse objects in a full-physics simulator. 3D assets in ManiSkill include large intra-class topological and geometric variations. Tasks are carefully chosen to cover distinct types of manipulation challenges. Latest progress in 3D vision also makes us believe that we should customize the benchmark so that the challenge is inviting to researchers working on 3D deep learning. To this end, we simulate a moving panoramic camera that returns ego-centric point clouds or RGB-D images. In addition, we would like ManiSkill to serve a broad set of researchers interested in manipulation research. Besides supporting the learning of policies from interactions, we also support learning-from-demonstrations (LfD) methods, by providing a large number of high-quality demonstrations (~36,000 successful trajectories, ~1.5M point cloud/RGB-D frames in total). We provide baselines using 3D deep learning and LfD algorithms. All code of our benchmark (simulator, environment, SDK, and baselines) is open-sourced, and a challenge facing interdisciplinary researchers will be held based on the benchmark.

updated: Sat Aug 28 2021 23:06:15 GMT+0000 (UTC)

published: Fri Jul 30 2021 08:20:22 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト