Trans4Map: Revisiting Holistic Bird's-Eye-View Mapping from Egocentric Images to Allocentric Semantics with Vision Transformers

Chang Chen; Jiaming Zhang; Kailun Yang; Kunyu Peng; Rainer Stiefelhagen

Trans4Map: ビジョントランスフォーマーを使用した自己中心的なイメージからアロセントリックなセマンティクスへのホリスティックな鳥瞰図マッピングの再検討

人間は、自己中心的な知覚から空間表現を抽出し、空間変換と記憶の更新を介してアロセントリックなセマンティックマップを形成できるため、周囲を感知する生来の能力を持っています。ただし、モバイルエージェントにこのような空間センシング能力を付与することは、次の 2 つの問題により、依然として課題です。 (2) 成功に必要な過度の計算予算は、多くの場合、マッピングパイプラインのステージへの分離につながり、マッピングプロセス全体が非効率になります。これらの問題に対処するために、Trans4Map と呼ばれるエンドツーエンドの 1 段階の Transformer ベースのマッピングフレームワークを提案します。私たちのエゴセントリックからアロセントリックへのマッピングプロセスには、次の 3 つのステップが含まれます。 (2) 提案された双方向アロセントリックメモリ (BAM) モジュールは、エゴセントリックな機能をアロセントリックメモリに投影します。 (3) マップデコーダは、蓄積されたメモリを解析し、トップダウンのセマンティックセグメンテーションマップを予測します。対照的に、Trans4Map は最先端の結果を達成し、67.2% のパラメーターを削減しながら、+3.25% の mIoU と +4.09% の mBF1 の改善を Matterport3D データセットで得ています。コード: https://github.com/jamycheung/Trans4Map。

Humans have an innate ability to sense their surroundings, as they can extract the spatial representation from the egocentric perception and form an allocentric semantic map via spatial transformation and memory updating. However, endowing mobile agents with such a spatial sensing ability is still a challenge, due to two difficulties: (1) the previous convolutional models are limited by the local receptive field, thus, struggling to capture holistic long-range dependencies during observation; (2) the excessive computational budgets required for success, often lead to a separation of the mapping pipeline into stages, resulting the entire mapping process inefficient. To address these issues, we propose an end-to-end one-stage Transformer-based framework for Mapping, termed Trans4Map. Our egocentric-to-allocentric mapping process includes three steps: (1) the efficient transformer extracts the contextual features from a batch of egocentric images; (2) the proposed Bidirectional Allocentric Memory (BAM) module projects egocentric features into the allocentric memory; (3) the map decoder parses the accumulated memory and predicts the top-down semantic segmentation map. In contrast, Trans4Map achieves state-of-the-art results, reducing 67.2% parameters, yet gaining a +3.25% mIoU and a +4.09% mBF1 improvements on the Matterport3D dataset. Code at: https://github.com/jamycheung/Trans4Map.

updated: Fri Oct 14 2022 13:23:28 GMT+0000 (UTC)

published: Wed Jul 13 2022 14:01:00 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト