Learning to Drive Using Sparse Imitation Reinforcement Learning

Yuci Han; Alper Yilmaz

スパース模倣強化学習を使用して運転することを学ぶ

本論文では、CARLAシミュレーション環境における自律運転（AD）タスクのための強化学習（RL）ポリシーとスパースエキスパート運転知識を組み合わせたハイブリッドエンドツーエンド制御ポリシーであるスパース模倣強化学習（SIRL）を提案します。スパースエキスパートは、最適ではないが、歩行者や車両の回避、信号機の検出などの重要なシナリオの経験を強制することにより、リスク回避的な戦略を提供する手作りのルールに基づいて設計されています。実証されているように、RLエージェントを最初からトレーニングすることは、データ効率が悪く、特に都市の運転タスクでは、広大な状態空間に起因する状況が複雑になるため、時間がかかります。私たちのSIRL戦略は、スパースエキスパートポリシーとRLポリシーの出力分布を融合して、複合運転ポリシーを生成することにより、これらの問題を解決するソリューションを提供します。トレーニングの初期段階でまばらな専門家の指導を受けて、SIRL戦略はトレーニングプロセスを加速し、RL探索が大惨事の結果を引き起こさないようにし、安全な探索を保証します。ある程度、SIRLエージェントは運転の専門家の行動を模倣しています。同時に、トレーニング中に継続的に知識を獲得するため、スパースエキスパートを超えて改善を続け、スパースエキスパートと従来のRLエージェントの両方を超えることができます。 CARLAシミュレーター内の複雑な都市シナリオで、提案されたSIRLアプローチの有効性を実験的に検証します。さらに、リスク回避的な調査と高い学習効率に関するSIRLエージェントのパフォーマンスを、従来のRLアプローチと比較します。さらに、運転技能を目に見えない環境に移すSIRLエージェントの一般化能力を示します。

In this paper, we propose Sparse Imitation Reinforcement Learning (SIRL), a hybrid end-to-end control policy that combines the sparse expert driving knowledge with reinforcement learning (RL) policy for autonomous driving (AD) task in CARLA simulation environment. The sparse expert is designed based on hand-crafted rules which is suboptimal but provides a risk-averse strategy by enforcing experience for critical scenarios such as pedestrian and vehicle avoidance, and traffic light detection. As it has been demonstrated, training a RL agent from scratch is data-inefficient and time consuming particularly for the urban driving task, due to the complexity of situations stemming from the vast size of state space. Our SIRL strategy provides a solution to solve these problems by fusing the output distribution of the sparse expert policy and the RL policy to generate a composite driving policy. With the guidance of the sparse expert during the early training stage, SIRL strategy accelerates the training process and keeps the RL exploration from causing a catastrophe outcome, and ensures safe exploration. To some extent, the SIRL agent is imitating the driving expert's behavior. At the same time, it continuously gains knowledge during training therefore it keeps making improvement beyond the sparse expert, and can surpass both the sparse expert and a traditional RL agent. We experimentally validate the efficacy of proposed SIRL approach in a complex urban scenario within the CARLA simulator. Besides, we compare the SIRL agent's performance for risk-averse exploration and high learning efficiency with the traditional RL approach. We additionally demonstrate the SIRL agent's generalization ability to transfer the driving skill to unseen environment.

updated: Tue May 24 2022 15:03:11 GMT+0000 (UTC)

published: Tue May 24 2022 15:03:11 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト