Unsupervised Visual Attention and Invariance for Reinforcement Learning

Xudong Wang; Long Lian; Stella X. Yu

強化学習のための教師なし視覚的注意と不変性

ビジョンベースの強化学習（RL）は成功していますが、それを未知のテスト環境に一般化する方法は依然として困難です。既存の方法は、視覚領域の変化に普遍的なRLポリシーのトレーニングに焦点を当てていますが、私たちは普遍的な視覚の前景を抽出し、RLポリシー学習者にクリーンな不変のビジョンを提供することに焦点を当てています。私たちの方法は完全に監視されておらず、手動の注釈や環境内部へのアクセスはありません。トレーニング環境でのアクションのビデオを前提として、教師なしキーポイント検出を使用して前景を抽出する方法を学習し、その後、教師なし視覚的注意を行って、ビデオフレームごとに前景マスクを自動的に生成します。次に、人工的なディストラクタを導入し、モデルをトレーニングして、ノイズの多い観測からクリーンな前景マスクを再構築できます。 RLポリシー学習者に気を散らすことのない視覚入力を提供するために、テスト中にこの学習済みモデルのみが必要です。 Visual Attention and Invariance（VAI）メソッドは、ビジュアルドメインの一般化に関して最先端の方法を大幅に上回り、DeepMind Control（DrawerWorld Manipulation）ベンチマークでエピソードごとに15〜49％（61〜229％）多くの累積報酬を獲得します。私たちの結果は、監督なしでドメイン不変のビジョンを学習することが可能であるだけでなく、視覚的な気晴らしからRLを解放することで、ポリシーをより集中させ、はるかに優れたものにすることを示しています。

Vision-based reinforcement learning (RL) is successful, but how to generalize it to unknown test environments remains challenging. Existing methods focus on training an RL policy that is universal to changing visual domains, whereas we focus on extracting visual foreground that is universal, feeding clean invariant vision to the RL policy learner. Our method is completely unsupervised, without manual annotations or access to environment internals. Given videos of actions in a training environment, we learn how to extract foregrounds with unsupervised keypoint detection, followed by unsupervised visual attention to automatically generate a foreground mask per video frame. We can then introduce artificial distractors and train a model to reconstruct the clean foreground mask from noisy observations. Only this learned model is needed during test to provide distraction-free visual input to the RL policy learner. Our Visual Attention and Invariance (VAI) method significantly outperforms the state-of-the-art on visual domain generalization, gaining 15 to 49% (61 to 229%) more cumulative rewards per episode on DeepMind Control (our DrawerWorld Manipulation) benchmarks. Our results demonstrate that it is not only possible to learn domain-invariant vision without any supervision, but freeing RL from visual distractions also makes the policy more focused and thus far better.

updated: Fri Apr 16 2021 18:56:48 GMT+0000 (UTC)

published: Wed Apr 07 2021 05:28:01 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト