Weakly-Supervised Multi-Granularity Map Learning for Vision-and-Language Navigation

Peihao Chen; Dongyu Ji; Kunyang Lin; Runhao Zeng; Thomas H. Li; Mingkui Tan; Chuang Gan

視覚と言語のナビゲーションのための弱教師付き多粒度マップ学習

いくつかの言語命令によって記述されたパスをたどる環境でナビゲートするロボットエージェントをトレーニングするという、実用的でありながら困難な問題に取り組みます。指示には、多くの場合、環境内のオブジェクトの説明が含まれています。正確で効率的なナビゲーションを実現するには、空間位置と環境オブジェクトのセマンティック情報の両方を正確に表すマップを作成することが重要です。ただし、環境にはさまざまな属性を持つ多様なオブジェクトが含まれることが多いため、ロボットが環境を適切に表すマップを作成できるようにすることは非常に困難です。この論文では、オブジェクトをより包括的に表現するために、オブジェクトの細かい詳細 (色、テクスチャなど) とセマンティッククラスの両方を含む多粒度マップを提案します。さらに、エージェントがマップ上の命令関連オブジェクトをローカライズする必要がある、教師が弱い補助タスクを提案します。このタスクを通じて、エージェントは、ナビゲーションのために指示に関連するオブジェクトをローカライズする方法を学習するだけでなく、オブジェクト情報を明らかにするより良いマップ表現を学習するよう奨励されます。次に、学習したマップと指示をウェイポイント予測子にフィードして、次のナビゲーションの目標を決定します。実験結果は、VLN-CE データセットで、可視環境と非可視環境の両方で、それぞれ 4.0% と 4.6% の wrt 成功率で、私たちの方法が最新技術を上回っていることを示しています。コードは https://github.com/PeihaoChen/WS-MGMap で入手できます。

We address a practical yet challenging problem of training robot agents to navigate in an environment following a path described by some language instructions. The instructions often contain descriptions of objects in the environment. To achieve accurate and efficient navigation, it is critical to build a map that accurately represents both spatial location and the semantic information of the environment objects. However, enabling a robot to build a map that well represents the environment is extremely challenging as the environment often involves diverse objects with various attributes. In this paper, we propose a multi-granularity map, which contains both object fine-grained details (e.g., color, texture) and semantic classes, to represent objects more comprehensively. Moreover, we propose a weakly-supervised auxiliary task, which requires the agent to localize instruction-relevant objects on the map. Through this task, the agent not only learns to localize the instruction-relevant objects for navigation but also is encouraged to learn a better map representation that reveals object information. We then feed the learned map and instruction to a waypoint predictor to determine the next navigation goal. Experimental results show our method outperforms the state-of-the-art by 4.0% and 4.6% w.r.t. success rate both in seen and unseen environments, respectively on VLN-CE dataset. Code is available at https://github.com/PeihaoChen/WS-MGMap.

updated: Fri Oct 14 2022 04:23:27 GMT+0000 (UTC)

published: Fri Oct 14 2022 04:23:27 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト