GRI: General Reinforced Imitation and its Application to Vision-Based Autonomous Driving

Raphael Chekroun; Marin Toromanoff; Sascha Hornauer; Fabien Moutarde

GRI：一般的な強化模倣とその視覚ベースの自動運転への応用

深層強化学習（DRL）は、自動運転やロボット工学など、いくつかの複雑な意思決定アプリケーションに効果的であることが実証されています。ただし、DRLは、サンプルの複雑さが高く、安定性がないために制限されることで有名です。専門家によるデモンストレーションなどの事前知識が利用できることがよくありますが、これらの問題を軽減するために活用するのは困難です。このホワイトペーパーでは、一般的な強化模倣（GRI）を提案します。これは、探索と専門家のデータの利点を組み合わせ、ポリシー外のRLアルゴリズムに簡単に実装できる新しい方法です。単純化した仮説を1つ作成します。専門家によるデモンストレーションは、基礎となるポリシーが常に高い報酬を得る完璧なデータと見なすことができます。この仮定に基づいて、GRIはオフラインデモンストレーションエージェントの概念を導入しています。このエージェントは、オンラインRL探索エージェントからの経験と同時に、区別なく処理される専門家データを送信します。私たちのアプローチにより、都市環境での視覚ベースの自動運転を大幅に改善できることを示します。さらに、さまざまなオフポリシーRLアルゴリズムを使用してMujoco連続制御タスクでGRIメソッドを検証します。私たちの方法は、CARLAリーダーボードで最初にランク付けされ、以前の最先端技術であるWorld on Railsを17％上回っています。

Deep reinforcement learning (DRL) has been demonstrated to be effective for several complex decision-making applications such as autonomous driving and robotics. However, DRL is notoriously limited by its high sample complexity and its lack of stability. Prior knowledge, e.g. as expert demonstrations, is often available but challenging to leverage to mitigate these issues. In this paper, we propose General Reinforced Imitation (GRI), a novel method which combines benefits from exploration and expert data and is straightforward to implement over any off-policy RL algorithm. We make one simplifying hypothesis: expert demonstrations can be seen as perfect data whose underlying policy gets a constant high reward. Based on this assumption, GRI introduces the notion of offline demonstration agents. This agent sends expert data which are processed both concurrently and indistinguishably with the experiences coming from the online RL exploration agent. We show that our approach enables major improvements on vision-based autonomous driving in urban environments. We further validate the GRI method on Mujoco continuous control tasks with different off-policy RL algorithms. Our method ranked first on the CARLA Leaderboard and outperforms World on Rails, the previous state-of-the-art, by 17%.

updated: Tue May 17 2022 15:38:30 GMT+0000 (UTC)

published: Tue Nov 16 2021 15:52:54 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト