Variational Dynamic for Self-Supervised Exploration in Deep Reinforcement Learning

Chenjia Bai; Peng Liu; Kaiyu Liu; Lingxiao Wang; Yingnan Zhao; Lei Han

深層強化学習における自己監視探索のための変分ダイナミクス

効率的な探索は、特に環境からの外因性の報酬がまばらであるか、完全に無視されているタスクの場合、強化学習において依然として困難な問題です。本質的な動機に基づく重要な進歩は、単純な環境では有望な結果を示していますが、マルチモーダルで確率的なダイナミクスを持つ環境ではしばしば行き詰まります。この作業では、マルチモダリティと確率論をモデル化するために、条件付き変分推論に基づく変分動的モデルを提案します。現在の状態、アクション、潜在変数の条件下で次の状態の予測を生成することにより、環境の状態とアクションの遷移を条件付きの生成プロセスと見なします。これにより、ダイナミクスの理解が深まり、探索のパフォーマンスが向上します。環境遷移の負の対数尤度の上限を導き出し、その上限を探索の固有の報酬として使用します。これにより、エージェントは、外部の報酬を観察せずに、自己監視探索によってスキルを学習できます。提案された方法を、いくつかの画像ベースのシミュレーションタスクと実際のロボット操作タスクで評価します。私たちの方法は、いくつかの最先端の環境モデルベースの探索アプローチよりも優れています。

Efficient exploration remains a challenging problem in reinforcement learning, especially for tasks where extrinsic rewards from environments are sparse or even totally disregarded. Significant advances based on intrinsic motivation show promising results in simple environments but often get stuck in environments with multimodal and stochastic dynamics. In this work, we propose a variational dynamic model based on the conditional variational inference to model the multimodality and stochasticity. We consider the environmental state-action transition as a conditional generative process by generating the next-state prediction under the condition of the current state, action, and latent variable, which provides a better understanding of the dynamics and leads a better performance in exploration. We derive an upper bound of the negative log-likelihood of the environmental transition and use such an upper bound as the intrinsic reward for exploration, which allows the agent to learn skills by self-supervised exploration without observing extrinsic rewards. We evaluate the proposed method on several image-based simulation tasks and a real robotic manipulating task. Our method outperforms several state-of-the-art environment model-based exploration approaches.

updated: Tue Apr 02 2024 02:09:15 GMT+0000 (UTC)

published: Sat Oct 17 2020 09:54:51 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト