ViT-BEVSeg: A Hierarchical Transformer Network for Monocular Birds-Eye-View Segmentation

Pramit Dutta; Ganesh Sistu; Senthil Yogamani; Edgar Galván; John McDonald

ViT-BEVSeg：単眼鳥-目-ビューセグメンテーションのための階層型トランスネットワーク

環境の詳細な近距離知覚モデルを生成することは、自動運転車と自律移動ロボットの両方で重要で挑戦的な問題です。パノラマ表現を提供するBirdEyeView（BEV）マップは、多くのダウンストリームタスクの正確なセマンティックレベルセグメンテーションを使用して、車両周囲の簡略化された2D表現を提供する一般的に使用されるアプローチです。 BEVマップを生成するための現在の最先端のアプローチでは、畳み込みニューラルネットワーク（CNN）バックボーンを使用して、空間トランスフォーマーを通過するフィーチャマップを作成し、派生したフィーチャをBEV座標フレームに投影します。このホワイトペーパーでは、BEVマップを生成するためのバックボーンアーキテクチャとしてのビジョントランスフォーマー（ViT）の使用を評価します。当社のネットワークアーキテクチャであるViT-BEVSegは、標準のビジョントランスフォーマーを使用して、入力画像のマルチスケール表現を生成します。結果として得られる表現は、BEVグリッドにセグメンテーションマップを出力する空間トランスデコーダモジュールへの入力として提供されます。 nuScenesデータセットでアプローチを評価し、最先端のアプローチと比較してパフォーマンスが大幅に向上していることを示しています。

Generating a detailed near-field perceptual model of the environment is an important and challenging problem in both self-driving vehicles and autonomous mobile robotics. A Bird Eye View (BEV) map, providing a panoptic representation, is a commonly used approach that provides a simplified 2D representation of the vehicle surroundings with accurate semantic level segmentation for many downstream tasks. Current state-of-the art approaches to generate BEV-maps employ a Convolutional Neural Network (CNN) backbone to create feature-maps which are passed through a spatial transformer to project the derived features onto the BEV coordinate frame. In this paper, we evaluate the use of vision transformers (ViT) as a backbone architecture to generate BEV maps. Our network architecture, ViT-BEVSeg, employs standard vision transformers to generate a multi-scale representation of the input image. The resulting representation is then provided as an input to a spatial transformer decoder module which outputs segmentation maps in the BEV grid. We evaluate our approach on the nuScenes dataset demonstrating a considerable improvement in the performance relative to state-of-the-art approaches.

updated: Tue May 31 2022 10:18:36 GMT+0000 (UTC)

published: Tue May 31 2022 10:18:36 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト