Multimedia Generative Script Learning for Task Planning

Qingyun Wang; Manling Li; Hou Pong Chan; Lifu Huang; Julia Hockenmaier; Girish Chowdhary; Heng Ji

タスク計画のためのマルチメディア生成スクリプト学習

目標指向の生成スクリプト学習は、目標に基づいて後続のステップを生成することを目的としています。これは、ロボットが日常生活の典型的な活動を実行するのを支援するために不可欠なタスクです。過去の状態が人々に与えられた言語的指示によってキャプチャされるだけでなく、付随する画像によって提供される追加情報で補強された場合、このタスクのパフォーマンスを改善できることを示しています。したがって、テキストとビジョンの両方のモダリティで履歴状態を追跡し、2,338 のタスクと 31,496 のステップを含む最初のベンチマークを説明的な画像で提示することにより、後続のステップを生成する新しいタスク、Multimedia Generative Script Learning を提案します。私たちは、視覚的な状態を追跡可能で、目に見えないタスクに帰納的で、個々のステップが多様なスクリプトを生成することを目指しています。マルチメディア選択的エンコーダーを介して視覚状態の変化をエンコードし、検索拡張デコーダーを使用して以前に観察されたタスクから知識を転送し、多様性指向の対照的な学習目標を最適化することにより、各ステップで個別の情報を提示することを提案します。生成品質と誘導品質の両方を評価するための指標を定義します。実験結果は、私たちのアプローチが強力なベースラインよりも大幅に優れていることを示しています。

Goal-oriented generative script learning aims to generate subsequent steps based on a goal, which is an essential task to assist robots in performing stereotypical activities of daily life. We show that the performance of this task can be improved if historical states are not just captured by the linguistic instructions given to people, but are augmented with the additional information provided by accompanying images. Therefore, we propose a new task, Multimedia Generative Script Learning, to generate subsequent steps by tracking historical states in both text and vision modalities, as well as presenting the first benchmark containing 2,338 tasks and 31,496 steps with descriptive images. We aim to generate scripts that are visual-state trackable, inductive for unseen tasks, and diverse in their individual steps. We propose to encode visual state changes through a multimedia selective encoder, transferring knowledge from previously observed tasks using a retrieval-augmented decoder, and presenting the distinct information at each step by optimizing a diversity-oriented contrastive learning objective. We define metrics to evaluate both generation quality and inductive quality. Experiment results demonstrate that our approach significantly outperforms strong baselines.

updated: Fri May 26 2023 04:57:22 GMT+0000 (UTC)

published: Thu Aug 25 2022 19:04:28 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト