Coaching a Teachable Student

Jimuyang Zhang; Zanming Huang; Eshed Ohn-Bar

教えやすい生徒を指導する

我々は、特権的な教師エージェントの監督から感覚運動学生エージェントに運転するように効果的に教えるための、新しい知識蒸留フレームワークを提案します。現在の感覚運動エージェントの抽出方法は、生徒による学習された運転行動が最適ではない結果になる傾向があります。これは、2 つのエージェントの入力、モデリング能力、最適化プロセス間の固有の違いによるものであると私たちは仮説を立てています。私たちは、これらの制限に対処し、感覚運動エージェントとその特権的な教師との間のギャップを埋めることができる新しい蒸留スキームを開発します。私たちの重要な洞察は、入力特徴を教師の特権的な鳥瞰図 (BEV) 空間に合わせて調整することを学ぶ生徒を設計することです。生徒は、内部表現の学習に関して教師による直接の監督から恩恵を受けることができます。困難な感覚運動学習タスクの足場を築くために、学生モデルは、さまざまな補助監督を備えた学生のペースのコーチングメカニズムによって最適化されます。さらに、CARLA の以前の特権エージェントを超え、学生が安全な運転行動を確実に学習できる、大容量の模倣学習特権エージェントを提案します。私たちが提案した感覚運動エージェントは、CARLA で堅牢な画像ベースの行動クローニングエージェントを実現し、LiDAR、履歴観察、モデルのアンサンブル、ポリシー上のデータ集約、強化学習を必要とせずに、現在のモデルよりも運転スコアが 20.6% 以上向上します。

We propose a novel knowledge distillation framework for effectively teaching a sensorimotor student agent to drive from the supervision of a privileged teacher agent. Current distillation for sensorimotor agents methods tend to result in suboptimal learned driving behavior by the student, which we hypothesize is due to inherent differences between the input, modeling capacity, and optimization processes of the two agents. We develop a novel distillation scheme that can address these limitations and close the gap between the sensorimotor agent and its privileged teacher. Our key insight is to design a student which learns to align their input features with the teacher's privileged Bird's Eye View (BEV) space. The student then can benefit from direct supervision by the teacher over the internal representation learning. To scaffold the difficult sensorimotor learning task, the student model is optimized via a student-paced coaching mechanism with various auxiliary supervision. We further propose a high-capacity imitation learned privileged agent that surpasses prior privileged agents in CARLA and ensures the student learns safe driving behavior. Our proposed sensorimotor agent results in a robust image-based behavior cloning agent in CARLA, improving over current models by over 20.6% in driving score without requiring LiDAR, historical observations, ensemble of models, on-policy data aggregation or reinforcement learning.

updated: Fri Jun 16 2023 17:59:38 GMT+0000 (UTC)

published: Fri Jun 16 2023 17:59:38 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト