Sparse Graphical Memory for Robust Planning

Scott Emmons; Ajay Jain; Michael Laskin; Thanard Kurutach; Pieter Abbeel; Deepak Pathak

堅牢な計画のためのスパースグラフィカルメモリ

現実の世界で効果的に機能するためには、エージェントは画像などの高次元の生の感覚入力から行動し、長い期間にわたって多様な目標を達成できる必要があります。現在の深い強化と模倣の学習方法は、高次元の入力から直接学習できますが、長期的なタスクにはうまく対応できません。対照的に、A *検索のような古典的なグラフィカルな方法は、長期的なタスクを解決できますが、状態空間が生の感覚入力から抽象化されていると想定しています。最近の研究では、深層学習と古典的計画の長所を組み合わせようとしています。ただし、このドメインの主要なメソッドは依然として非常に脆弱であり、環境のサイズに応じて拡張性が低くなります。状態と実行可能な遷移をスパースメモリに格納する新しいデータ構造であるスパースグラフィカルメモリ（SGM）を紹介します。 SGMは、新しい双方向の整合性目標に従って状態を集約し、従来の状態集約基準を目標条件付きRLに適合させます。2つの状態は、目標と開始状態の両方として交換可能である場合、冗長です。理論的には、双方向の整合性に従ってノードをマージすると、マージのしきい値に比例してのみスケーリングする最短パス長が増加することを証明します。実験的に、SGMは、長期にわたる、報酬がまばらなビジュアルナビゲーションタスクで、現在の最先端の方法を大幅に上回っていることを示しています。プロジェクトのビデオとコードはhttps://mishalaskin.github.io/sgm/で入手できます。

To operate effectively in the real world, agents should be able to act from high-dimensional raw sensory input such as images and achieve diverse goals across long time-horizons. Current deep reinforcement and imitation learning methods can learn directly from high-dimensional inputs but do not scale well to long-horizon tasks. In contrast, classical graphical methods like A* search are able to solve long-horizon tasks, but assume that the state space is abstracted away from raw sensory input. Recent works have attempted to combine the strengths of deep learning and classical planning; however, dominant methods in this domain are still quite brittle and scale poorly with the size of the environment. We introduce Sparse Graphical Memory (SGM), a new data structure that stores states and feasible transitions in a sparse memory. SGM aggregates states according to a novel two-way consistency objective, adapting classic state aggregation criteria to goal-conditioned RL: two states are redundant when they are interchangeable both as goals and as starting states. Theoretically, we prove that merging nodes according to two-way consistency leads to an increase in shortest path lengths that scales only linearly with the merging threshold. Experimentally, we show that SGM significantly outperforms current state of the art methods on long horizon, sparse-reward visual navigation tasks. Project video and code are available at https://mishalaskin.github.io/sgm/

updated: Thu Nov 12 2020 21:37:49 GMT+0000 (UTC)

published: Fri Mar 13 2020 17:59:32 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト