Sparse Curriculum Reinforcement Learning for End-to-End Driving

Pranav Agarwal; Pierre de Beaucorps; Raoul de Charette

エンドツーエンドの運転のためのスパースカリキュラム強化学習

深い強化エンドツーエンドの運転のための学習は、複雑な報酬エンジニアリングの必要性によって制限されます。報酬がまばらであると、この課題を回避できますが、トレーニング時間が長くなり、最適ではないポリシーにつながります。この作業では、目標条件付きのスパース報酬のみを使用した運転を検討し、仮想と現実の小さなドメインギャップの恩恵を受けるナビゲーションビューマップのみを使用したエンドツーエンドの運転のためのカリキュラム学習アプローチを提案します。複数の運転ポリシーの複雑さに対処するために、ナビゲーションシステムによる推論で選択される同時の個々のポリシーを学習します。目に見えない道路レイアウトを一般化し、トレーニングよりも長く運転するという提案の能力を示します。

Deep reinforcement Learning for end-to-end driving is limited by the need of complex reward engineering. Sparse rewards can circumvent this challenge but suffers from long training time and leads to sub-optimal policy. In this work, we explore driving using only goal conditioned sparse rewards and propose a curriculum learning approach for end to end driving using only navigation view maps that benefit from small virtual-to-real domain gap. To address the complexity of multiple driving policies, we learn concurrent individual policies which are selected at inference by a navigation system. We demonstrate the ability of our proposal to generalize on unseen road layout, and to drive longer than in the training.

updated: Tue Mar 16 2021 16:39:09 GMT+0000 (UTC)

published: Tue Mar 16 2021 16:39:09 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト