Differentiable Architecture Search for Reinforcement Learning

Yingjie Miao; Xingyou Song; John D. Co-Reyes; Daiyi Peng; Summer Yue; Eugene Brevdo; Aleksandra Faust

強化学習のための差別化可能なアーキテクチャ検索

この論文では、基本的な質問を調査します：勾配ベースのニューラルアーキテクチャ検索（NAS）技術はRLにどの程度適用できますか？オリジナルのDARTSを便利なベースラインとして使用すると、検出されたディスクリートアーキテクチャは、ポリシー外およびポリシー上のRLアルゴリズム全体で、ディスクリートおよび連続アクションスペース環境の両方で手動アーキテクチャ設計と比較して最大250％のパフォーマンスをわずか3倍で達成できることがわかりました。より多くの計算時間。さらに、多数のアブレーション研究を通じて、DARTSがスーパーネットフレーズ中に操作を正しくアップウェイトするだけでなく、ランダム検索よりも最大30倍効率的に結果の離散セルを徐々に改善することを体系的に検証します。これは、DARTSが驚くほど効果的なツールであるということを示唆しています。 RL。

In this paper, we investigate the fundamental question: To what extent are gradient-based neural architecture search (NAS) techniques applicable to RL? Using the original DARTS as a convenient baseline, we discover that the discrete architectures found can achieve up to 250% performance compared to manual architecture designs on both discrete and continuous action space environments across off-policy and on-policy RL algorithms, at only 3x more computation time. Furthermore, through numerous ablation studies, we systematically verify that not only does DARTS correctly upweight operations during its supernet phrase, but also gradually improves resulting discrete cells up to 30x more efficiently than random search, suggesting DARTS is surprisingly an effective tool for improving architectures in RL.

updated: Tue May 24 2022 17:51:21 GMT+0000 (UTC)

published: Fri Jun 04 2021 03:08:43 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト