MVFuseNet: Improving End-to-End Object Detection and Motion Forecasting through Multi-View Fusion of LiDAR Data

Ankit Laddha; Shivam Gautam; Stefan Palombo; Shreyash Pandey; Carlos Vallespi-Gonzalez

MVFuseNet：LiDARデータのマルチビューフュージョンによるエンドツーエンドのオブジェクト検出とモーション予測の改善

この作業では、LiDARデータの時系列からの共同オブジェクト検出と動き予測のための新しいエンドツーエンドの方法であるMVFuseNetを提案します。ほとんどの既存の方法は、範囲ビュー（RV）または鳥瞰図（BEV）のいずれかでデータを投影することにより、単一のビューで動作します。対照的に、我々は、RVとBEVの両方を、時間的融合ネットワークの一部としての時空間特徴学習と、バックボーンネットワークでのマルチスケール特徴学習に効果的に利用する方法を提案します。さらに、時間的融合ネットワークで複数のビューを効果的に利用する新しい順次融合アプローチを提案します。 2つの大規模な自動運転データセットでの検出とモーション予測のタスクに対するマルチビューアプローチの利点を示し、最先端の結果を実現します。さらに、MVFusenetは、リアルタイムのパフォーマンスを維持しながら、広い動作範囲に適切に拡張できることを示しています。

In this work, we propose MVFuseNet, a novel end-to-end method for joint object detection and motion forecasting from a temporal sequence of LiDAR data. Most existing methods operate in a single view by projecting data in either range view (RV) or bird's eye view (BEV). In contrast, we propose a method that effectively utilizes both RV and BEV for spatio-temporal feature learning as part of a temporal fusion network as well as for multi-scale feature learning in the backbone network. Further, we propose a novel sequential fusion approach that effectively utilizes multiple views in the temporal fusion network. We show the benefits of our multi-view approach for the tasks of detection and motion forecasting on two large-scale self-driving data sets, achieving state-of-the-art results. Furthermore, we show that MVFusenet scales well to large operating ranges while maintaining real-time performance.

updated: Wed Apr 21 2021 21:29:08 GMT+0000 (UTC)

published: Wed Apr 21 2021 21:29:08 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト