Grounding Complex Navigational Instructions Using Scene Graphs

Michiel de Jong; Satyapriya Krishna; Anuva Agarwal

シーングラフを使用した複雑なナビゲーション指示の基礎

自然言語の指示を実行するための強化学習エージェントのトレーニングは、利用可能な監督、つまり指示がいつ実行されたかを知ることによって制限されます。 CLEVR 視覚的質問応答データセットを適応させて、複雑な自然言語ナビゲーション指示と付随するシーングラフを生成し、環境に依存しない教師ありデータセットを生成します。このデータセットの使用法を示すために、シーンを VizDoom 環境にマッピングし、gatedattention のアーキテクチャを使用して、これらのより複雑な言語の指示を実行するエージェントをトレーニングします。

Training a reinforcement learning agent to carry out natural language instructions is limited by the available supervision, i.e. knowing when the instruction has been carried out. We adapt the CLEVR visual question answering dataset to generate complex natural language navigation instructions and accompanying scene graphs, yielding an environment-agnostic supervised dataset. To demonstrate the use of this data set, we map the scenes to the VizDoom environment and use the architecture in gatedattention to train an agent to carry out these more complex language instructions.

updated: Thu Jun 03 2021 05:45:21 GMT+0000 (UTC)

published: Thu Jun 03 2021 05:45:21 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト