Hierarchical Neural Dynamic Policies

Shikhar Bahl; Abhinav Gupta; Deepak Pathak

階層的ニューラル動的ポリシー

高次元の画像入力から学びながら、現実世界の動的タスクの目に見えない構成への一般化の問題に取り組みます。非線形動的システムベースの方法のファミリーは、動的ロボットの動作を正常に実証しましたが、画像入力から学習するだけでなく、目に見えない構成に一般化することも困難です。最近の研究では、深いネットワークポリシーを使用し、アクションを再パラメーター化して動的システムの構造を埋め込むことでこの問題に取り組んでいますが、イメージゴールの構成が多様なドメインでは依然として苦労しているため、一般化するのは困難です。このホワイトペーパーでは、階層型ニューラル動的ポリシー（H-NDP）と呼ばれる階層型の深いポリシー学習フレームワークに動的システムの構造を埋め込むことを活用して、この二分法に対処します。 H-NDPは、深い動的システムを多様なデータに直接適合させる代わりに、状態空間の小さな領域でローカルの動的システムベースのポリシーを学習し、それらを高次からのみ動作するグローバルな動的システムベースのポリシーに抽出することによってカリキュラムを形成します。次元画像。 H-NDPはさらに、スムーズな軌道を提供し、現実の世界で強力な安全上の利点を提供します。実世界（数字の書き込み、すくい取り、注ぐ）とシミュレーション（キャッチ、スロー、ピッキング）の両方で動的タスクの広範な実験を実行します。 H-NDPは、模倣と強化学習の両方のセットアップと簡単に統合でき、最先端の結果を達成できることを示しています。ビデオの結果はhttps://shikharbahl.github.io/hierarchical-ndps/にあります

We tackle the problem of generalization to unseen configurations for dynamic tasks in the real world while learning from high-dimensional image input. The family of nonlinear dynamical system-based methods have successfully demonstrated dynamic robot behaviors but have difficulty in generalizing to unseen configurations as well as learning from image inputs. Recent works approach this issue by using deep network policies and reparameterize actions to embed the structure of dynamical systems but still struggle in domains with diverse configurations of image goals, and hence, find it difficult to generalize. In this paper, we address this dichotomy by leveraging embedding the structure of dynamical systems in a hierarchical deep policy learning framework, called Hierarchical Neural Dynamical Policies (H-NDPs). Instead of fitting deep dynamical systems to diverse data directly, H-NDPs form a curriculum by learning local dynamical system-based policies on small regions in state-space and then distill them into a global dynamical system-based policy that operates only from high-dimensional images. H-NDPs additionally provide smooth trajectories, a strong safety benefit in the real world. We perform extensive experiments on dynamic tasks both in the real world (digit writing, scooping, and pouring) and simulation (catching, throwing, picking). We show that H-NDPs are easily integrated with both imitation as well as reinforcement learning setups and achieve state-of-the-art results. Video results are at https://shikharbahl.github.io/hierarchical-ndps/

updated: Mon Jul 12 2021 17:59:58 GMT+0000 (UTC)

published: Mon Jul 12 2021 17:59:58 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト