Reinforcement Learning with Videos: Combining Offline Observations with Interaction

Karl Schmeckpeper; Oleh Rybkin; Kostas Daniilidis; Sergey Levine; Chelsea Finn

ビデオによる強化学習：オフライン観察と相互作用の組み合わせ

強化学習は、ロボットが経験からスキルを習得するための強力なフレームワークですが、多くの場合、大量のオンラインデータ収集が必要です。その結果、ロボットが広く一般化するために必要な十分に多様な経験を収集することは困難です。一方、人間のビデオは、幅広く興味深い体験のすぐに利用できるソースです。この論文では、人間が収集した経験に直接強化学習を実行できるかという質問について考察します。このようなビデオはアクションで注釈が付けられておらず、ロボットの実施形態と比較して実質的な視覚領域シフトを示すため、この問題は特に困難である。これらの課題に対処するために、ビデオによる強化学習（RLV）のフレームワークを提案します。 RLVは、人間が収集した経験とロボットが収集したデータを組み合わせて、ポリシーと価値の機能を学習します。私たちの実験では、RLVはそのようなビデオを活用して、ゼロから学習するRLメソッドの半分以下のサンプルで挑戦的な視覚ベースのスキルを学習できることがわかりました。

Reinforcement learning is a powerful framework for robots to acquire skills from experience, but often requires a substantial amount of online data collection. As a result, it is difficult to collect sufficiently diverse experiences that are needed for robots to generalize broadly. Videos of humans, on the other hand, are a readily available source of broad and interesting experiences. In this paper, we consider the question: can we perform reinforcement learning directly on experience collected by humans? This problem is particularly difficult, as such videos are not annotated with actions and exhibit substantial visual domain shift relative to the robot's embodiment. To address these challenges, we propose a framework for reinforcement learning with videos (RLV). RLV learns a policy and value function using experience collected by humans in combination with data collected by robots. In our experiments, we find that RLV is able to leverage such videos to learn challenging vision-based skills with less than half as many samples as RL methods that learn from scratch.

updated: Thu Nov 12 2020 17:15:48 GMT+0000 (UTC)

published: Thu Nov 12 2020 17:15:48 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト