Instruction-driven history-aware policies for robotic manipulations

Pierre-Louis Guhur; Shizhe Chen; Ricardo Garcia; Makarand Tapaswi; Ivan Laptev; Cordelia Schmid

ロボット操作のための命令主導の履歴認識ポリシー

人間の環境では、単純な自然言語の指示が与えられたロボットは、さまざまな操作タスクを実行することが期待されています。しかし、ロボットのマニピュレーションは、きめの細かいモーター制御、長期記憶、および以前には見られなかったタスクや環境への一般化を必要とするため、非常に困難です。これらの課題に対処するために、複数の入力を考慮した統一された変圧器ベースのアプローチを提案します。特に、当社のトランスフォーマーアーキテクチャは、(i) 自然言語命令と (ii) マルチビューシーンの観察を統合し、(iii) 観察とアクションの完全な履歴を追跡します。このようなアプローチにより、履歴と命令の間の依存関係を学習でき、複数のビューを使用して操作の精度を向上させることができます。挑戦的なRLBenchベンチマークと実際のロボットでメソッドを評価します。特に、私たちのアプローチは 74 の多様な RLBench タスクに対応し、最新技術を凌駕しています。また、命令条件付きタスクに対処し、これまでに見られなかったバリエーションに対する優れた一般化を示します。

In human environments, robots are expected to accomplish a variety of manipulation tasks given simple natural language instructions. Yet, robotic manipulation is extremely challenging as it requires fine-grained motor control, long-term memory as well as generalization to previously unseen tasks and environments. To address these challenges, we propose a unified transformer-based approach that takes into account multiple inputs. In particular, our transformer architecture integrates (i) natural language instructions and (ii) multi-view scene observations while (iii) keeping track of the full history of observations and actions. Such an approach enables learning dependencies between history and instructions and improves manipulation precision using multiple views. We evaluate our method on the challenging RLBench benchmark and on a real-world robot. Notably, our approach scales to 74 diverse RLBench tasks and outperforms the state of the art. We also address instruction-conditioned tasks and demonstrate excellent generalization to previously unseen variations.

updated: Sat Dec 17 2022 18:12:32 GMT+0000 (UTC)

published: Sun Sep 11 2022 16:28:25 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト