Bridging the Imitation Gap by Adaptive Insubordination

Luca Weihs; Unnat Jain; Iou-Jen Liu; Jordi Salvador; Svetlana Lazebnik; Aniruddha Kembhavi; Alexander Schwing

適応的不服従による模倣ギャップの橋渡し

実際には、専門家の監督を提供する教育エージェントを設計することが可能な場合は常に、純粋な強化学習よりも模倣学習が好まれます。ただし、教師が生徒が利用できない特権情報にアクセスして決定を下すと、模倣学習中にこの情報が無視され、「模倣ギャップ」が発生し、結果が悪くなる可能性があることを示します。以前の研究は、模倣学習から強化学習への進歩を通じてこのギャップを埋めています。多くの場合成功しますが、探索と暗記を頻繁に切り替える必要があるタスクでは、段階的な進行は失敗します。これらのタスクにより適切に対処し、模倣のギャップを緩和するために、「適応的不服従」（ADVISOR）を提案します。 ADVISORは、トレーニング中に模倣と報酬ベースの強化学習の損失を動的に重み付けし、模倣と探索をオンザフライで切り替えることができるようにします。グリッドワールド、マルチエージェント粒子環境、および忠実度の高い3Dシミュレーター内で設定された一連の困難なタスクで、ADVISORを使用したオンザフライスイッチングが、純粋な模倣、純粋な強化学習、およびそれらの順次および並列の組み合わせよりも優れていることを示します。。

In practice, imitation learning is preferred over pure reinforcement learning whenever it is possible to design a teaching agent to provide expert supervision. However, we show that when the teaching agent makes decisions with access to privileged information that is unavailable to the student, this information is marginalized during imitation learning, resulting in an "imitation gap" and, potentially, poor results. Prior work bridges this gap via a progression from imitation learning to reinforcement learning. While often successful, gradual progression fails for tasks that require frequent switches between exploration and memorization. To better address these tasks and alleviate the imitation gap we propose 'Adaptive Insubordination' (ADVISOR). ADVISOR dynamically weights imitation and reward-based reinforcement learning losses during training, enabling on-the-fly switching between imitation and exploration. On a suite of challenging tasks set within gridworlds, multi-agent particle environments, and high-fidelity 3D simulators, we show that on-the-fly switching with ADVISOR outperforms pure imitation, pure reinforcement learning, as well as their sequential and parallel combinations.

updated: Fri Dec 03 2021 18:53:42 GMT+0000 (UTC)

published: Thu Jul 23 2020 17:59:57 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト