Improving Computational Efficiency in Visual Reinforcement Learning via Stored Embeddings

Lili Chen; Kimin Lee; Aravind Srinivas; Pieter Abbeel

保存された埋め込みによる視覚強化学習の計算効率の改善

ポリシー外の深層強化学習（RL）の最近の進歩により、視覚的観察による複雑なタスクで目覚ましい成功を収めています。エクスペリエンスリプレイは、過去のエクスペリエンスを再利用することでサンプル効率を向上させ、畳み込みニューラルネットワーク（CNN）は高次元の入力を効果的に処理します。ただし、このような手法では、高いメモリと計算帯域幅が必要です。このホワイトペーパーでは、これらの計算要件とメモリ要件に対処するために、既存のポリシー外のRLメソッドを簡単に変更した効率的な強化学習（SEER）の埋め込み埋め込みを紹介します。 CNNでの勾配更新の計算オーバーヘッドを削減するために、パラメーターの早期収束により、トレーニングの早い段階でCNNエンコーダーの下位層をフリーズします。さらに、高次元の画像の代わりに体験再生用の低次元の潜在ベクトルを保存することでメモリ要件を削減し、制約付きメモリ設定で役立つ手法である再生バッファ容量の適応的な増加を可能にします。私たちの実験では、SEERがRLエージェントのパフォーマンスを低下させない一方で、DeepMindControl環境とAtariゲームの多様なセット全体で計算とメモリを大幅に節約することを示しています。最後に、CNNの下位層がさまざまなタスクやドメインに使用できる一般化可能な機能を抽出するため、SEERがRLでの計算効率の高い転送学習に役立つことを示します。

Recent advances in off-policy deep reinforcement learning (RL) have led to impressive success in complex tasks from visual observations. Experience replay improves sample-efficiency by reusing experiences from the past, and convolutional neural networks (CNNs) process high-dimensional inputs effectively. However, such techniques demand high memory and computational bandwidth. In this paper, we present Stored Embeddings for Efficient Reinforcement Learning (SEER), a simple modification of existing off-policy RL methods, to address these computational and memory requirements. To reduce the computational overhead of gradient updates in CNNs, we freeze the lower layers of CNN encoders early in training due to early convergence of their parameters. Additionally, we reduce memory requirements by storing the low-dimensional latent vectors for experience replay instead of high-dimensional images, enabling an adaptive increase in the replay buffer capacity, a useful technique in constrained-memory settings. In our experiments, we show that SEER does not degrade the performance of RL agents while significantly saving computation and memory across a diverse set of DeepMind Control environments and Atari games. Finally, we show that SEER is useful for computation-efficient transfer learning in RL because lower layers of CNNs extract generalizable features, which can be used for different tasks and domains.

updated: Thu Mar 04 2021 08:14:10 GMT+0000 (UTC)

published: Thu Mar 04 2021 08:14:10 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト