Building Intelligent Autonomous Navigation Agents

Devendra Singh Chaplot

インテリジェント自律ナビゲーションエージェントの構築

過去10年間の機械学習の飛躍的進歩により、「デジタルインテリジェンス」、つまり、大量のラベル付きデータから学習して、音声認識、顔認識、機械翻訳などのいくつかのデジタルタスクを実行できる機械学習モデルが生まれました。この論文の目標は、「物理的知性」が可能なアルゴリズムの設計に向けて前進することです。つまり、視覚、自然言語理解、推論、計画、および順次意思決定。過去数十年の古典的なナビゲーション方法のいくつかの進歩にもかかわらず、現在のナビゲーションエージェントは長期的なセマンティックナビゲーションタスクに苦労しています。論文の最初の部分では、障害物の回避、意味的知覚、言語の根拠、推論などの課題に取り組むために、エンドツーエンドの強化学習を使用した短期ナビゲーションに関する作業について説明します。第2部では、モジュール式学習と構造化された明示的なマップ表現に基づく新しいクラスのナビゲーション方法を紹介します。これは、古典的な学習方法とエンドツーエンドの学習方法の両方の長所を活用して、長期的なナビゲーションタスクに取り組みます。これらの方法が、ローカリゼーション、マッピング、長期計画、探索、セマンティック事前学習などの課題に効果的に取り組むことができることを示します。これらのモジュール式学習方法は、長期的な空間的および意味的理解が可能であり、さまざまなナビゲーションタスクで最先端の結果を達成します。

Breakthroughs in machine learning in the last decade have led to `digital intelligence', i.e. machine learning models capable of learning from vast amounts of labeled data to perform several digital tasks such as speech recognition, face recognition, machine translation and so on. The goal of this thesis is to make progress towards designing algorithms capable of `physical intelligence', i.e. building intelligent autonomous navigation agents capable of learning to perform complex navigation tasks in the physical world involving visual perception, natural language understanding, reasoning, planning, and sequential decision making. Despite several advances in classical navigation methods in the last few decades, current navigation agents struggle at long-term semantic navigation tasks. In the first part of the thesis, we discuss our work on short-term navigation using end-to-end reinforcement learning to tackle challenges such as obstacle avoidance, semantic perception, language grounding, and reasoning. In the second part, we present a new class of navigation methods based on modular learning and structured explicit map representations, which leverage the strengths of both classical and end-to-end learning methods, to tackle long-term navigation tasks. We show that these methods are able to effectively tackle challenges such as localization, mapping, long-term planning, exploration and learning semantic priors. These modular learning methods are capable of long-term spatial and semantic understanding and achieve state-of-the-art results on various navigation tasks.

updated: Fri Jun 25 2021 04:10:58 GMT+0000 (UTC)

published: Fri Jun 25 2021 04:10:58 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト