A Dual Semantic-Aware Recurrent Global-Adaptive Network For Vision-and-Language Navigation

Liuyi Wang; Zongtao He; Jiagui Tang; Ronghao Dang; Naijia Wang; Chengju Liu; Qijun Chen

視覚と言語のナビゲーションのためのデュアルセマンティック認識リカレントグローバル適応ネットワーク

視覚と言語のナビゲーション (VLN) は現実的ではありますが、やりがいのあるタスクであり、エージェントが言語と視覚の合図を使用してターゲット領域の位置を特定する必要があります。最近大幅な進歩が達成されましたが、依然として 2 つの大きな制限があります。(1) 視覚と言語の両方に隠された重要なガイド意味論のための明示的な情報マイニングはまだ研究が進んでいません。 (2) 以前に構造化されたマップ手法は、訪問したノードの平均的な履歴外観を提供しますが、推論プロセスにおけるさまざまな画像の独特の寄与と強力な情報保持を無視します。この研究では、上記の問題に対処するために、デュアルセマンティック認識リカレントグローバル適応ネットワーク (DSRG) を提案しています。まず、DSRG は、視覚と言語の意味学習をそれぞれ強化するための、命令ガイダンス言語モジュール (IGL) と外観意味視覚モジュール (ASV) を提案します。メモリメカニズムについては、明示的なパノラマ観察融合のためにグローバル適応集約モジュール (GAA) が考案され、暗黙的な一時的な隠れ状態を提供するためにリカレントメモリ融合モジュール (RMF) が導入されています。 R2R および REVERIE データセットに関する広範な実験結果は、私たちの方法が既存の方法よりも優れたパフォーマンスを達成することを示しています。コードは https://github.com/CrystalSixone/DSRG で入手できます。

Vision-and-Language Navigation (VLN) is a realistic but challenging task that requires an agent to locate the target region using verbal and visual cues. While significant advancements have been achieved recently, there are still two broad limitations: (1) The explicit information mining for significant guiding semantics concealed in both vision and language is still under-explored; (2) The previously structured map method provides the average historical appearance of visited nodes, while it ignores distinctive contributions of various images and potent information retention in the reasoning process. This work proposes a dual semantic-aware recurrent global-adaptive network (DSRG) to address the above problems. First, DSRG proposes an instruction-guidance linguistic module (IGL) and an appearance-semantics visual module (ASV) for boosting vision and language semantic learning respectively. For the memory mechanism, a global adaptive aggregation module (GAA) is devised for explicit panoramic observation fusion, and a recurrent memory fusion module (RMF) is introduced to supply implicit temporal hidden states. Extensive experimental results on the R2R and REVERIE datasets demonstrate that our method achieves better performance than existing methods. Code is available at https://github.com/CrystalSixone/DSRG.

updated: Tue May 30 2023 02:33:12 GMT+0000 (UTC)

published: Fri May 05 2023 15:06:08 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト