Challenges and Opportunities in Offline Reinforcement Learning from Visual Observations

Cong Lu; Philip J. Ball; Tim G. J. Rudner; Jack Parker-Holder; Michael A. Osborne; Yee Whye Teh

視覚的観察からのオフライン強化学習における課題と機会

オフライン強化学習は、事前に収集された大規模なデータセットをポリシー学習に活用する点で大きな期待が寄せられており、エージェントは高価なオンラインデータ収集を省略できるようになります。しかし、連続アクション空間による視覚観察からのオフライン強化学習は依然として研究不足であり、この複雑な領域における主要な課題についての理解は限られています。この論文では、視覚領域での継続的制御のための単純なベースラインを確立し、現実世界のオフライン RL 問題に存在するデータ分布をより適切に表現するように設計され、一連のガイドによって導かれる、視覚観察からのオフライン強化学習のための一連のベンチマークタスクを紹介します。視覚的な注意散漫に対する堅牢性や視覚的に識別可能なダイナミクスの変化など、視覚的な観察からオフライン RL の要望を決定します。この一連のベンチマークタスクを使用して、2 つの一般的なビジョンベースのオンライン強化学習アルゴリズム、DreamerV2 と DrQ-v2 を簡単に変更するだけで、既存のオフライン RL 手法を上回る性能を発揮し、ビジュアルドメインでの継続的な制御のための競争力のあるベースラインを確立するのに十分であることを示します。私たちはこれらのアルゴリズムを厳密に評価し、視覚的観察から継続的に制御するための最先端のモデルベースとモデルフリーのオフライン RL 手法の違いを経験的に評価します。この評価で使用されるすべてのコードとデータは、この分野の進歩を促進するためにオープンソース化されています。

Offline reinforcement learning has shown great promise in leveraging large pre-collected datasets for policy learning, allowing agents to forgo often-expensive online data collection. However, offline reinforcement learning from visual observations with continuous action spaces remains under-explored, with a limited understanding of the key challenges in this complex domain. In this paper, we establish simple baselines for continuous control in the visual domain and introduce a suite of benchmarking tasks for offline reinforcement learning from visual observations designed to better represent the data distributions present in real-world offline RL problems and guided by a set of desiderata for offline RL from visual observations, including robustness to visual distractions and visually identifiable changes in dynamics. Using this suite of benchmarking tasks, we show that simple modifications to two popular vision-based online reinforcement learning algorithms, DreamerV2 and DrQ-v2, suffice to outperform existing offline RL methods and establish competitive baselines for continuous control in the visual domain. We rigorously evaluate these algorithms and perform an empirical evaluation of the differences between state-of-the-art model-based and model-free offline RL methods for continuous control from visual observations. All code and data used in this evaluation are open-sourced to facilitate progress in this domain.

updated: Thu Jul 06 2023 16:46:11 GMT+0000 (UTC)

published: Thu Jun 09 2022 22:08:47 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト