CoBEVT: Cooperative Bird's Eye View Semantic Segmentation with Sparse Transformers

Runsheng Xu; Zhengzhong Tu; Hao Xiang; Wei Shao; Bolei Zhou; Jiaqi Ma

CoBEVT: スパーストランスフォーマーを使用した協同的鳥瞰図セマンティックセグメンテーション

鳥瞰図 (BEV) セマンティックセグメンテーションは、自動運転の空間センシングにおいて重要な役割を果たします。最近の文献では、BEV マップの理解が大幅に進歩していますが、それらはすべて単一エージェントのカメラベースのシステムに基づいています。これらのソリューションでは、複雑な交通シーンでのオクルージョンの処理や遠くのオブジェクトの検出が困難な場合があります。車両間 (V2V) 通信技術により、自動運転車がセンシング情報を共有できるようになり、単一エージェントシステムと比較して、認識性能と範囲が劇的に改善されました。この論文では、協調的にBEVマップ予測を生成できる最初の汎用マルチエージェントマルチカメラ認識フレームワークであるCoBEVTを提案します。基礎となる Transformer アーキテクチャでマルチビューデータとマルチエージェントデータからのカメラ機能を効率的に融合するために、ビューとエージェント間でまばらにローカルおよびグローバルな空間相互作用をキャプチャする融合軸注意モジュール (FAX) を設計します。 V2V 知覚データセット、OPV2V に関する広範な実験は、CoBEVT が協調的な BEV セマンティックセグメンテーションの最先端のパフォーマンスを達成することを示しています。さらに、CoBEVT は、1) シングルエージェントマルチカメラを使用した BEV セグメンテーション、2) マルチエージェント LiDAR システムを使用した 3D オブジェクト検出など、他のタスクに一般化できることが示され、リアルタイムで最先端のパフォーマンスを実現します。推論速度。コードは https://github.com/DerrickXuNu/CoBEVT で入手できます。

Bird's eye view (BEV) semantic segmentation plays a crucial role in spatial sensing for autonomous driving. Although recent literature has made significant progress on BEV map understanding, they are all based on single-agent camera-based systems. These solutions sometimes have difficulty handling occlusions or detecting distant objects in complex traffic scenes. Vehicle-to-Vehicle (V2V) communication technologies have enabled autonomous vehicles to share sensing information, dramatically improving the perception performance and range compared to single-agent systems. In this paper, we propose CoBEVT, the first generic multi-agent multi-camera perception framework that can cooperatively generate BEV map predictions. To efficiently fuse camera features from multi-view and multi-agent data in an underlying Transformer architecture, we design a fused axial attention module (FAX), which captures sparsely local and global spatial interactions across views and agents. The extensive experiments on the V2V perception dataset, OPV2V, demonstrate that CoBEVT achieves state-of-the-art performance for cooperative BEV semantic segmentation. Moreover, CoBEVT is shown to be generalizable to other tasks, including 1) BEV segmentation with single-agent multi-camera and 2) 3D object detection with multi-agent LiDAR systems, achieving state-of-the-art performance with real-time inference speed. The code is available at https://github.com/DerrickXuNu/CoBEVT.

updated: Sun Sep 25 2022 07:19:32 GMT+0000 (UTC)

published: Tue Jul 05 2022 17:59:28 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト