Goal-constrained Sparse Reinforcement Learning for End-to-End Driving

Pranav Agarwal; Pierre de Beaucorps; Raoul de Charette

エンドツーエンドの運転のための目標制約付きスパース強化学習

深い強化エンドツーエンドの運転のための学習は、複雑な報酬エンジニアリングの必要性によって制限されます。報酬がまばらであると、この課題を回避できますが、トレーニング時間が長くなり、最適ではないポリシーにつながります。この作業では、目標に制約のあるスパース報酬のみを使用したフルコントロール運転を検討し、仮想ドメインと実ドメインの小さなギャップの恩恵を受けるナビゲーションビューマップのみを使用して、エンドツーエンドの運転のためのカリキュラム学習アプローチを提案します。複数の運転ポリシーの複雑さに対処するために、ナビゲーションシステムによる推論で選択された同時の個々のポリシーを学習します。私たちは、目に見えない道路レイアウトを一般化し、トレーニングよりも大幅に長く運転するという提案の能力を示しています。

Deep reinforcement Learning for end-to-end driving is limited by the need of complex reward engineering. Sparse rewards can circumvent this challenge but suffers from long training time and leads to sub-optimal policy. In this work, we explore full-control driving with only goal-constrained sparse reward and propose a curriculum learning approach for end-to-end driving using only navigation view maps that benefit from small virtual-to-real domain gap. To address the complexity of multiple driving policies, we learn concurrent individual policies selected at inference by a navigation system. We demonstrate the ability of our proposal to generalize on unseen road layout, and to drive significantly longer than in the training.

updated: Sat Jul 31 2021 16:31:56 GMT+0000 (UTC)

published: Tue Mar 16 2021 16:39:09 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト