Stabilizing Deep Q-Learning with ConvNets and Vision Transformers under Data Augmentation

Nicklas Hansen; Hao Su; Xiaolong Wang

データ拡張の下でのConvNetsとVisionTransformersによるディープQ学習の安定化

強化学習（RL）によってトレーニングされたエージェントは、視覚的な観察から直接、ますます困難になるタスクを解決できますが、学習したスキルを新しい環境に一般化することは、依然として非常に困難です。データ拡張の広範な使用は、RLの一般化を改善するための有望な手法ですが、サンプルの効率を低下させ、発散につながることさえあることがよくあります。このホワイトペーパーでは、一般的なオフポリシーRLアルゴリズムでデータ拡張を使用する場合の不安定性の原因を調査します。どちらも高分散Qターゲットに根ざした2つの問題を特定します。私たちの発見に基づいて、拡張下でこのクラスのアルゴリズムを安定させるためのシンプルで効果的な手法を提案します。 DeepMind Control Suiteに基づくベンチマークのファミリー、およびロボット操作タスクで、ConvNetsとVision Transformers（ViT）の両方を使用して、画像ベースのRLの広範な経験的評価を実行します。私たちの方法は、増強下のConvNetの安定性とサンプル効率を大幅に改善し、視覚が見えない環境での画像ベースのRLの最先端の方法と競合する一般化結果を実現します。さらに、私たちの方法がViTベースのアーキテクチャでRLに対応していること、およびこの設定ではデータ拡張が特に重要である可能性があることを示します。

While agents trained by Reinforcement Learning (RL) can solve increasingly challenging tasks directly from visual observations, generalizing learned skills to novel environments remains very challenging. Extensive use of data augmentation is a promising technique for improving generalization in RL, but it is often found to decrease sample efficiency and can even lead to divergence. In this paper, we investigate causes of instability when using data augmentation in common off-policy RL algorithms. We identify two problems, both rooted in high-variance Q-targets. Based on our findings, we propose a simple yet effective technique for stabilizing this class of algorithms under augmentation. We perform extensive empirical evaluation of image-based RL using both ConvNets and Vision Transformers (ViT) on a family of benchmarks based on DeepMind Control Suite, as well as in robotic manipulation tasks. Our method greatly improves stability and sample efficiency of ConvNets under augmentation, and achieves generalization results competitive with state-of-the-art methods for image-based RL in environments with unseen visuals. We further show that our method scales to RL with ViT-based architectures, and that data augmentation may be especially important in this setting.

updated: Thu Dec 09 2021 16:25:39 GMT+0000 (UTC)

published: Thu Jul 01 2021 17:58:05 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト