Predictive Experience Replay for Continual Visual Control and Forecasting

Wendong Zhang; Geng Chen; Xiangming Zhu; Siyu Gao; Yunbo Wang; Xiaokang Yang

継続的な視覚制御と予測のための予測エクスペリエンス再生

一連の非定常環境での物理ダイナミクスの学習は、視覚入力を使用したモデルベースの強化学習 (MBRL) にとって、挑戦的ではありますが不可欠なタスクです。エージェントは、以前の知識を忘れることなく、新しいタスクに一貫して適応する必要があります。この論文では、ビジュアルダイナミクスモデリングのための新しい継続的学習アプローチを提示し、視覚制御と予測におけるその有効性を探ります。主要な仮定は、理想的な世界モデルが忘れない環境シミュレーターを提供できることです。これにより、エージェントは、世界モデルから想像された軌跡に基づいて、マルチタスク学習方法でポリシーを最適化できます。この目的のために、最初に、ガウス分布の混合でタスク固有のダイナミクスの事前分布を学習する混合世界モデルを提案し、次に予測的経験リプレイと呼ばれる壊滅的な忘却を克服するための新しいトレーニング戦略を導入します。最後に、これらの方法を継続的な RL に拡張し、探索的保守的な行動学習アプローチを使用して値推定の問題にさらに対処します。私たちのモデルは、DeepMind Control および Meta-World ベンチマークでの既存の継続的学習と視覚的 RL アルゴリズムの素朴な組み合わせと、継続的な視覚的制御タスクを大幅に上回っています。また、進化するドメインを持つビデオ予測データセットの時空間ダイナミクスの忘却を効果的に軽減することも示されています。

Learning physical dynamics in a series of non-stationary environments is a challenging but essential task for model-based reinforcement learning (MBRL) with visual inputs. It requires the agent to consistently adapt to novel tasks without forgetting previous knowledge. In this paper, we present a new continual learning approach for visual dynamics modeling and explore its efficacy in visual control and forecasting. The key assumption is that an ideal world model can provide a non-forgetting environment simulator, which enables the agent to optimize the policy in a multi-task learning manner based on the imagined trajectories from the world model. To this end, we first propose the mixture world model that learns task-specific dynamics priors with a mixture of Gaussians, and then introduce a new training strategy to overcome catastrophic forgetting, which we call predictive experience replay. Finally, we extend these methods to continual RL and further address the value estimation problems with the exploratory-conservative behavior learning approach. Our model remarkably outperforms the naive combinations of existing continual learning and visual RL algorithms on DeepMind Control and Meta-World benchmarks with continual visual control tasks. It is also shown to effectively alleviate the forgetting of spatiotemporal dynamics in video prediction datasets with evolving domains.

updated: Sun Mar 12 2023 05:08:03 GMT+0000 (UTC)

published: Sun Mar 12 2023 05:08:03 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト