Learning for Visual Navigation by Imagining the Success

Mahdi Kazemi Moghaddam; Ehsan Abbasnejad; Qi Wu; Javen Shi; Anton Van Den Hengel

成功を想像することによる視覚的ナビゲーションのための学習

ビジュアルナビゲーションは、強化学習（RL）の問題としてキャストされることがよくあります。現在の方法では、通常、一般的な障害物の回避と検索動作を学習する次善のポリシーが作成されます。たとえば、ターゲットオブジェクトのナビゲーション設定では、ターゲットが人間の観点から明らかに手の届く範囲にある場合でも、従来の方法で学習されたポリシーはタスクを完了できないことがよくあります。この問題に対処するために、成功した（サブ）目標状態の潜在的な表現を想像することを学ぶことを提案します。そのために、Foresight Imagination（ForeSIT）と呼ばれるモジュールを開発しました。 ForeSITは、成功につながる将来の状態の繰り返しの潜在的表現を想像するように訓練されています。たとえば、ターゲットの前に到達することが重要なサブゴール状態、またはゴール状態自体のいずれかです。トレーニング中に生成された想像力に基づいてポリシーを調整することにより、エージェントはこの想像力を使用して目標を確実に達成する方法を学習します。私たちのエージェントは、（サブ）目標状態が（潜在空間で）どのように見えるかを想像することができ、その状態に向かってナビゲートすることを学ぶことができます。 ForeSITをポリシーに基づいた方法でトレーニングし、それをRL目標に統合するための効率的な学習アルゴリズムを開発します。想像力と政策の両方で共有される絶えず進化する状態表現のため、統合は簡単ではありません。経験的に、私たちの方法は、一般的に受け入れられているベンチマークAI2THOR環境で、最先端の方法を大幅に上回っています。私たちの方法は、他のモデルフリーのRLナビゲーションフレームワークに簡単に統合または追加できます。

Visual navigation is often cast as a reinforcement learning (RL) problem. Current methods typically result in a suboptimal policy that learns general obstacle avoidance and search behaviours. For example, in the target-object navigation setting, the policies learnt by traditional methods often fail to complete the task, even when the target is clearly within reach from a human perspective. In order to address this issue, we propose to learn to imagine a latent representation of the successful (sub-)goal state. To do so, we have developed a module which we call Foresight Imagination (ForeSIT). ForeSIT is trained to imagine the recurrent latent representation of a future state that leads to success, e.g. either a sub-goal state that is important to reach before the target, or the goal state itself. By conditioning the policy on the generated imagination during training, our agent learns how to use this imagination to achieve its goal robustly. Our agent is able to imagine what the (sub-)goal state may look like (in the latent space) and can learn to navigate towards that state. We develop an efficient learning algorithm to train ForeSIT in an on-policy manner and integrate it into our RL objective. The integration is not trivial due to the constantly evolving state representation shared between both the imagination and the policy. We, empirically, observe that our method outperforms the state-of-the-art methods by a large margin in the commonly accepted benchmark AI2THOR environment. Our method can be readily integrated or added to other model-free RL navigation frameworks.

updated: Sun Feb 28 2021 10:25:46 GMT+0000 (UTC)

published: Sun Feb 28 2021 10:25:46 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト