Self-Supervised Learning of State Estimation for Manipulating Deformable Linear Objects

Mengyuan Yan; Yilin Zhu; Ning Jin; Jeannette Bohg

変形可能な線形オブジェクトを操作するための状態推定の自己監視学習

線形変形可能オブジェクトのモデルベースの視覚的ロボット操作を示します。私たちのアプローチは、ロボットが制御しようとしている物理システムの状態空間表現に基づいています。この選択には、物理学の事前知識をダイナミクスモデルと知覚モデルに組み込むことの容易さ、操作操作を計画することの容易さなど、複数の利点があります。さらに、物理的な状態は、さまざまな外観のオブジェクトインスタンスを自然に表すことができます。したがって、状態空間のダイナミクスは1つの設定で学習でき、他の視覚的に異なる設定で直接使用できます。これは、視覚的な違いへの一般化が保証されていないピクセル空間または潜在空間で学習されるダイナミクスとは対照的です。状態空間アプローチを採用する際の課題は、生データから変形可能なオブジェクトの高次元状態を推定することです。実際のデータでは注釈が非常に高価であり、正確で一般化可能で、計算が効率的な動力学モデルを見つけることができます。私たちは、高価な注釈を必要とせずに、実際の画像でロープ状態推定の自己監視トレーニングを実証した最初の企業です。これは、広範囲の視覚的外観にわたって一般化できる、新しい自己管理学習目標によって達成されます。推定されたロープの状態で、質量ばねシステムの物理をエンコードする高速で微分可能なニューラルネットワークダイナミクスモデルをトレーニングします。私たちの方法は、明示的な状態推定を含まず、トレーニングデータの3％のみを使用しながら、物理を事前に使用しないモデルと比較して、将来の状態を予測する精度が高くなります。また、モデル予測コントローラー内で使用すると、シミュレーションと実際のロボットの両方で、このアプローチがより効率的な操作を実現することも示しています。

We demonstrate model-based, visual robot manipulation of linear deformable objects. Our approach is based on a state-space representation of the physical system that the robot aims to control. This choice has multiple advantages, including the ease of incorporating physics priors in the dynamics model and perception model, and the ease of planning manipulation actions. In addition, physical states can naturally represent object instances of different appearances. Therefore, dynamics in the state space can be learned in one setting and directly used in other visually different settings. This is in contrast to dynamics learned in pixel space or latent space, where generalization to visual differences are not guaranteed. Challenges in taking the state-space approach are the estimation of the high-dimensional state of a deformable object from raw images, where annotations are very expensive on real data, and finding a dynamics model that is both accurate, generalizable, and efficient to compute. We are the first to demonstrate self-supervised training of rope state estimation on real images, without requiring expensive annotations. This is achieved by our novel self-supervising learning objective, which is generalizable across a wide range of visual appearances. With estimated rope states, we train a fast and differentiable neural network dynamics model that encodes the physics of mass-spring systems. Our method has a higher accuracy in predicting future states compared to models that do not involve explicit state estimation and do not use any physics prior, while only using 3% of training data. We also show that our approach achieves more efficient manipulation, both in simulation and on a real robot, when used within a model predictive controller.

updated: Tue Oct 06 2020 04:06:53 GMT+0000 (UTC)

published: Thu Nov 14 2019 18:04:51 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト