Scaling Robot Learning with Semantically Imagined Experience

Tianhe Yu; Ted Xiao; Austin Stone; Jonathan Tompson; Anthony Brohan; Su Wang; Jaspiar Singh; Clayton Tan; Dee M; Jodilyn Peralta; Brian Ichter; Karol Hausman; Fei Xia

意味的に想像された経験によるロボット学習のスケーリング

ロボット学習の最近の進歩は、ロボットがさまざまな操作タスクを実行し、新しいシナリオに一般化できるようにする上で有望であることを示しています。この進歩の主な要因の 1 つは、モデルのトレーニングに使用されるロボットデータの規模です。大規模なデータセットを取得するために、以前のアプローチは、高度な人間の関与を必要とするデモンストレーションまたはエンジニアリングに重点を置いた自律的なデータ収集スキームのいずれかに依存しており、どちらもスケーリングが困難です。この問題を軽減するために、代替ルートを提案し、コンピュータービジョンと自然言語処理で広く使用されているテキストから画像への基盤モデルを活用して、追加のロボットデータを必要とせずにロボット学習に意味のあるデータを取得します。私たちはこの方法をセマンティックイメージエクスペリエンスによるロボット学習 (ROSIE) と呼んでいます。具体的には、最先端のテキストから画像への拡散モデルを利用し、既存のロボット操作データセットに加えて、操作、背景、およびテキストガイダンスによる注意散漫のためのさまざまな目に見えないオブジェクトを修復することにより、積極的なデータ拡張を実行します。大規模な現実世界の実験を通じて、このように拡張されたデータでトレーニングされた操作ポリシーは、新しいオブジェクトで完全に目に見えないタスクを解決でき、新しいディストラクターに対してより堅牢に動作できることを示しています。さらに、拡散ベースのデータ拡張を使用したトレーニングにより、成功検出などの高レベルのロボット学習タスクのロバスト性と一般化を改善できることがわかりました。プロジェクトのウェブサイトとビデオは、diffusion-rosie.github.io にあります。

Recent advances in robot learning have shown promise in enabling robots to perform a variety of manipulation tasks and generalize to novel scenarios. One of the key contributing factors to this progress is the scale of robot data used to train the models. To obtain large-scale datasets, prior approaches have relied on either demonstrations requiring high human involvement or engineering-heavy autonomous data collection schemes, both of which are challenging to scale. To mitigate this issue, we propose an alternative route and leverage text-to-image foundation models widely used in computer vision and natural language processing to obtain meaningful data for robot learning without requiring additional robot data. We term our method Robot Learning with Semantically Imagened Experience (ROSIE). Specifically, we make use of the state of the art text-to-image diffusion models and perform aggressive data augmentation on top of our existing robotic manipulation datasets via inpainting various unseen objects for manipulation, backgrounds, and distractors with text guidance. Through extensive real-world experiments, we show that manipulation policies trained on data augmented this way are able to solve completely unseen tasks with new objects and can behave more robustly w.r.t. novel distractors. In addition, we find that we can improve the robustness and generalization of high-level robot learning tasks such as success detection through training with the diffusion-based data augmentation. The project's website and videos can be found at diffusion-rosie.github.io

updated: Wed Feb 22 2023 18:47:51 GMT+0000 (UTC)

published: Wed Feb 22 2023 18:47:51 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト