BEVDetNet: Bird's Eye View LiDAR Point Cloud based Real-time 3D Object Detection for Autonomous Driving

Sambit Mohapatra; Senthil Yogamani; Heinrich Gotzig; Stefan Milz; Patrick Mader

BEVDetNet：自動運転のための鳥瞰図LiDARポイントクラウドベースのリアルタイム3Dオブジェクト検出

LiDARポイントクラウドに基づく3Dオブジェクト検出は、特に長距離センシングの自動運転における重要なモジュールです。研究のほとんどは、より高い精度の達成に焦点を合わせており、これらのモデルは、遅延と電力効率の観点から、組み込みシステムへの展開用に最適化されていません。高速運転のシナリオでは、危険な状況に対応するための時間が長くなるため、待ち時間は重要なパラメーターです。通常、このモジュールでは、ボクセルまたはポイントクラウドベースの3D畳み込みアプローチが使用されます。まず、効率的な並列化には適していないため、組み込みプラットフォームでは非効率的です。第二に、安全システムで必要とされる決定論に反するシーンのスパース性のレベルのために、それらは可変の実行時間を持っています。この作業では、実行時間が固定された非常に低遅延のアルゴリズムを開発することを目指しています。キーポイントを使用したオブジェクト中心検出、ボックス予測、およびより単純な鳥瞰図（BEV）2D表現でのビン分類を使用した方向予測のための単一の統合モデルとして、新しいセマンティックセグメンテーションアーキテクチャを提案します。提案されたアーキテクチャは、追加の計算なしで道路のようなセマンティックセグメンテーションクラスを含むように簡単に拡張できます。提案されたモデルは、組み込みNvidiaXavierプラットフォームで4ミリ秒の遅延があります。このモデルは、他の最高精度モデルよりも5倍高速であり、KITTIデータセットのIoU = 0.5で平均精度が2％低下することは最小限です。

3D object detection based on LiDAR point clouds is a crucial module in autonomous driving particularly for long range sensing. Most of the research is focused on achieving higher accuracy and these models are not optimized for deployment on embedded systems from the perspective of latency and power efficiency. For high speed driving scenarios, latency is a crucial parameter as it provides more time to react to dangerous situations. Typically a voxel or point-cloud based 3D convolution approach is utilized for this module. Firstly, they are inefficient on embedded platforms as they are not suitable for efficient parallelization. Secondly, they have a variable runtime due to level of sparsity of the scene which is against the determinism needed in a safety system. In this work, we aim to develop a very low latency algorithm with fixed runtime. We propose a novel semantic segmentation architecture as a single unified model for object center detection using key points, box predictions and orientation prediction using binned classification in a simpler Bird's Eye View (BEV) 2D representation. The proposed architecture can be trivially extended to include semantic segmentation classes like road without any additional computation. The proposed model has a latency of 4 ms on the embedded Nvidia Xavier platform. The model is 5X faster than other top accuracy models with a minimal accuracy degradation of 2% in Average Precision at IoU=0.5 on KITTI dataset.

updated: Sun Jul 11 2021 00:42:38 GMT+0000 (UTC)

published: Wed Apr 21 2021 22:06:39 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト