MOCA: A Modular Object-Centric Approach for Interactive Instruction Following

Kunal Pratap Singh; Suvaansh Bhambri; Byeonghwi Kim; Roozbeh Mottaghi; Jonghyun Choi

MOCA：インタラクティブな命令に従うためのモジュラーオブジェクト中心のアプローチ

言語指令に基づいて簡単な家事を行うことは、人間にとって非常に自然なことですが、AIエージェントにとっては未解決の課題です。最近、シミュレートされた環境でオブジェクトの相互作用を必要とする長い命令シーケンスを推論する研究を促進するために、「対話型命令追跡」タスクが提案されました。それは、各ステップで視覚、言語、ナビゲーションの文献の未解決の問題を解決することを含みます。この多面的な問題に対処するために、タスクを視覚とアクションポリシーに分離するモジュラーアーキテクチャを提案し、それをモジュラーオブジェクトセントリックアプローチであるMOCAと名付けます。 ALFREDベンチマークでメソッドを評価し、優れた一般化パフォーマンス（目に見えない環境での高い成功率）を備えたすべてのメトリックで、従来技術よりも大幅に優れていることを経験的に検証します。私たちのコードはhttps://github.com/gistvision/mocaで入手できます。

Performing simple household tasks based on language directives is very natural to humans, yet it remains an open challenge for an AI agent. Recently, an `interactive instruction following' task has been proposed to foster research in reasoning over long instruction sequences that requires object interactions in a simulated environment. It involves solving open problems in vision, language and navigation literature at each step. To address this multifaceted problem, we propose a modular architecture that decouples the task into visual perception and action policy, and name it as MOCA, a Modular Object-Centric Approach. We evaluate our method on the ALFRED benchmark and empirically validate that it outperforms prior arts by significant margins in all metrics with good generalization performance (high success rate in unseen environments). Our code is available at https://github.com/gistvision/moca.

updated: Sun Dec 06 2020 07:59:22 GMT+0000 (UTC)

published: Sun Dec 06 2020 07:59:22 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト