Meta-Explore: Exploratory Hierarchical Vision-and-Language Navigation Using Scene Object Spectrum Grounding

Minyoung Hwang; Jaeyeon Jeong; Minsoo Kim; Yoonseon Oh; Songhwai Oh

Meta-Explore: シーンオブジェクトスペクトルグラウンディングを使用した探索的階層視覚と言語ナビゲーション

視覚と言語のナビゲーション (VLN) における主な課題は、目に見えない環境で自然言語の指示を理解する方法です。従来の VLN アルゴリズムの主な制限は、アクションが間違っている場合、エージェントが指示に従わなかったり、不要な領域を探索したりして、エージェントを回復不可能な経路に導くことです。この問題に取り組むために、Meta-Explore を提案します。これは、誤解を招く最近のアクションを修正するための悪用ポリシーを展開する階層的なナビゲーション方法です。未訪問だが観察可能な状態の中から適切に選択されたローカル目標に向かってエージェントを移動させる搾取ポリシーは、エージェントを以前に訪問した状態に移動させる方法よりも優れていることを示します。また、意味的に意味のある手がかりを使用して、残念な探索を想像する必要があることも強調しています。私たちのアプローチの鍵は、スペクトル領域でエージェントの周りのオブジェクトの配置を理解することです。具体的には、検出されたオブジェクトのカテゴリごとの 2D フーリエ変換を実行する、シーンオブジェクトスペクトル (SOS) と呼ばれる新しい視覚的表現を提示します。エクスプロイトポリシーと SOS 機能を組み合わせることで、エージェントは有望なローカルゴールを選択することでパスを修正できます。 R2R、SOON、REVERIE の 3 つの VLN ベンチマークで手法を評価します。 Meta-Explore は他のベースラインよりも優れており、大幅な汎化パフォーマンスを示しています。さらに、提案されたスペクトルドメイン SOS 機能を使用したローカルゴール検索は、SOON ベンチマークの成功率を 17.1%、SPL を 20.6% 大幅に改善します。

The main challenge in vision-and-language navigation (VLN) is how to understand natural-language instructions in an unseen environment. The main limitation of conventional VLN algorithms is that if an action is mistaken, the agent fails to follow the instructions or explores unnecessary regions, leading the agent to an irrecoverable path. To tackle this problem, we propose Meta-Explore, a hierarchical navigation method deploying an exploitation policy to correct misled recent actions. We show that an exploitation policy, which moves the agent toward a well-chosen local goal among unvisited but observable states, outperforms a method which moves the agent to a previously visited state. We also highlight the demand for imagining regretful explorations with semantically meaningful clues. The key to our approach is understanding the object placements around the agent in spectral-domain. Specifically, we present a novel visual representation, called scene object spectrum (SOS), which performs category-wise 2D Fourier transform of detected objects. Combining exploitation policy and SOS features, the agent can correct its path by choosing a promising local goal. We evaluate our method in three VLN benchmarks: R2R, SOON, and REVERIE. Meta-Explore outperforms other baselines and shows significant generalization performance. In addition, local goal search using the proposed spectral-domain SOS features significantly improves the success rate by 17.1% and SPL by 20.6% for the SOON benchmark.

updated: Tue Mar 07 2023 17:39:53 GMT+0000 (UTC)

published: Tue Mar 07 2023 17:39:53 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト