Decoupling Representation Learning from Reinforcement Learning

Adam Stooke; Kimin Lee; Pieter Abbeel; Michael Laskin

表現学習を強化学習から切り離す

画像からの深層強化学習（RL）における報酬駆動型特徴学習の制限を克服するために、表現学習をポリシー学習から分離することを提案します。この目的のために、Augmented Temporal Contrast（ATC）と呼ばれる新しい教師なし学習（UL）タスクを導入します。これは、畳み込みエンコーダーをトレーニングして、画像の拡張とコントラスト損失を使用して、短い時間差で分離された観測のペアを関連付けます。オンラインRL実験では、ATCのみを使用してエンコーダーをトレーニングすると、ほとんどの環境でエンドツーエンドのRLと一致するか、それよりも優れていることがわかります。さらに、専門家のデモンストレーションでエンコーダーを事前トレーニングし、RLエージェントで重みを凍結して使用することにより、いくつかの主要なULアルゴリズムのベンチマークを行います。 ATCでトレーニングされたエンコーダーを使用しているエージェントは、他のすべてのエージェントよりも優れていることがわかりました。また、複数の環境からのデータでマルチタスクエンコーダーをトレーニングし、さまざまなダウンストリームRLタスクへの一般化を示します。最後に、ATCのコンポーネントをアブレーションし、新しいデータ拡張を導入して、RLで拡張が必要な場合に、事前にトレーニングされたエンコーダーからの（圧縮された）潜像の再生を可能にします。私たちの実験は、DeepMind Control、DeepMind Lab、およびAtariの視覚的に多様なRLベンチマークにまたがっており、完全なコードはhttps://github.com/astooke/rlpyt/tree/master/rlpyt/ulで入手できます。

In an effort to overcome limitations of reward-driven feature learning in deep reinforcement learning (RL) from images, we propose decoupling representation learning from policy learning. To this end, we introduce a new unsupervised learning (UL) task, called Augmented Temporal Contrast (ATC), which trains a convolutional encoder to associate pairs of observations separated by a short time difference, under image augmentations and using a contrastive loss. In online RL experiments, we show that training the encoder exclusively using ATC matches or outperforms end-to-end RL in most environments. Additionally, we benchmark several leading UL algorithms by pre-training encoders on expert demonstrations and using them, with weights frozen, in RL agents; we find that agents using ATC-trained encoders outperform all others. We also train multi-task encoders on data from multiple environments and show generalization to different downstream RL tasks. Finally, we ablate components of ATC, and introduce a new data augmentation to enable replay of (compressed) latent images from pre-trained encoders when RL requires augmentation. Our experiments span visually diverse RL benchmarks in DeepMind Control, DeepMind Lab, and Atari, and our complete code is available at https://github.com/astooke/rlpyt/tree/master/rlpyt/ul.

updated: Sun May 16 2021 20:44:18 GMT+0000 (UTC)

published: Mon Sep 14 2020 19:11:13 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト