MOCA: A Modular Object-Centric Approach for Interactive Instruction Following

Kunal Pratap Singh; Suvaansh Bhambri; Byeonghwi Kim; Roozbeh Mottaghi; Jonghyun Choi

MOCA: インタラクティブな指示に従うためのモジュール式のオブジェクト中心のアプローチ

言語指示に基づいて単純な家事を行うことは、人間にとって非常に自然なことですが、AI エージェントにとっては未解決の課題です。最近、シミュレートされた環境でのオブジェクトの相互作用を必要とする長い命令シーケンスに対する推論の研究を促進するために、「インタラクティブな命令に従う」タスクが提案されています。これには、視覚、言語、ナビゲーションに関する文献の未解決の問題を各段階で解決することが含まれます。この多面的な問題に対処するために、タスクを視覚認識とアクションポリシーに分離するモジュール式アーキテクチャを提案し、モジュール式オブジェクト中心アプローチである MOCA と名付けました。私たちは ALFRED ベンチマークでこの方法を評価し、優れた汎化パフォーマンス (目に見えない環境での高い成功率) で、すべての指標で従来技術を大幅に上回っていることを経験的に検証します。私たちのコードは https://github.com/gistvision/moca で入手できます。

Performing simple household tasks based on language directives is very natural to humans, yet it remains an open challenge for an AI agent. Recently, an 'interactive instruction following' task has been proposed to foster research in reasoning over long instruction sequences that requires object interactions in a simulated environment. It involves solving open problems in vision, language and navigation literature at each step. To address this multifaceted problem, we propose a modular architecture that decouples the task into visual perception and action policy, and name it as MOCA, a Modular Object-Centric Approach. We evaluate our method on the ALFRED benchmark and empirically validate that it outperforms prior arts by significant margins in all metrics with good generalization performance (high success rate in unseen environments). Our code is available at https://github.com/gistvision/moca.

updated: Sat May 29 2021 15:49:23 GMT+0000 (UTC)

published: Sun Dec 06 2020 07:59:22 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト