Continual Reinforcement Learning in 3D Non-stationary Environments

Vincenzo Lomonaco; Karan Desai; Eugenio Culurciello; Davide Maltoni

3D非定常環境での継続的な強化学習

高次元の常に変化する環境は、現在の強化学習技術にとって難しい課題です。今日の人工エージェントは、シミュレーションで非常に静的で制御された条件でオフラインでトレーニングされることが多く、観測のトレーニングはi.i.d.観測空間全体から。ただし、実際の環境では、環境は非定常的であることが多く、予測不可能な頻繁な変更の影響を受けます。このホワイトペーパーでは、ViZDoomに基づく複雑な3D非定常タスクの強化を通じて継続的に学習するための新しいベンチマークであるCRLMazeを提案して公開し、いくつかの環境変化の影響を受けます。次に、エンドツーエンドのモデルフリーの継続的な強化学習戦略を紹介します。4つの異なるベースラインに関して競争力のある結果を示し、追加の監視対象信号、以前に遭遇した環境条件や観測へのアクセスを必要としません。

High-dimensional always-changing environments constitute a hard challenge for current reinforcement learning techniques. Artificial agents, nowadays, are often trained off-line in very static and controlled conditions in simulation such that training observations can be thought as sampled i.i.d. from the entire observations space. However, in real world settings, the environment is often non-stationary and subject to unpredictable, frequent changes. In this paper we propose and openly release CRLMaze, a new benchmark for learning continually through reinforcement in a complex 3D non-stationary task based on ViZDoom and subject to several environmental changes. Then, we introduce an end-to-end model-free continual reinforcement learning strategy showing competitive results with respect to four different baselines and not requiring any access to additional supervised signals, previously encountered environmental conditions or observations.

updated: Tue Apr 21 2020 14:57:48 GMT+0000 (UTC)

published: Fri May 24 2019 09:38:42 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト