Self-Supervised Learning of Multi-Object Keypoints for Robotic Manipulation

Jan Ole von Hartz; Eugenio Chisari; Tim Welschehold; Abhinav Valada

ロボット操作のためのマルチオブジェクトキーポイントの自己監視学習

近年、強化または模倣のいずれかを使用するポリシー学習方法が大幅に進歩しました。ただし、どちらの手法も、計算コストが高く、大量のトレーニングデータを必要とするという問題があります。この問題は、グラウンドトゥルースシーンの機能へのアクセスが利用できず、代わりに生のカメラ観測からポリシーが学習される、実際のロボット操作タスクで特によく見られます。この論文では、下流の政策学習のための密な対応の口実タスクを介して画像のキーポイントを学習することの有効性を示します。以前の作業を挑戦的なマルチオブジェクトシーンに拡張して、表現学習の重要な問題、主にスケール不変性とオクルージョンに対処するようにモデルをトレーニングできることを示します。多様なロボット操作タスクに対するアプローチを評価し、他の視覚表現学習アプローチと比較して、サンプル効率の高いポリシー学習に対する柔軟性と有効性を示します。

In recent years, policy learning methods using either reinforcement or imitation have made significant progress. However, both techniques still suffer from being computationally expensive and requiring large amounts of training data. This problem is especially prevalent in real-world robotic manipulation tasks, where access to ground truth scene features is not available and policies are instead learned from raw camera observations. In this paper, we demonstrate the efficacy of learning image keypoints via the Dense Correspondence pretext task for downstream policy learning. Extending prior work to challenging multi-object scenes, we show that our model can be trained to deal with important problems in representation learning, primarily scale-invariance and occlusion. We evaluate our approach on diverse robot manipulation tasks, compare it to other visual representation learning approaches, and demonstrate its flexibility and effectiveness for sample-efficient policy learning.

updated: Tue Oct 11 2022 09:06:57 GMT+0000 (UTC)

published: Tue May 17 2022 13:15:07 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト