LXL: LiDAR Excluded Lean 3D Object Detection with 4D Imaging Radar and Camera Fusion

Weiyi Xiong; Jianan Liu; Tao Huang; Qing-Long Han; Yuxuan Xia; Bing Zhu

LXL: 4D イメージングレーダーとカメラフュージョンによる LiDAR 除外の無駄のない 3D 物体検出

4Dイメージングレーダーは、新興技術であり比較的手頃な価格のデバイスであるため、自動運転における3D物体検出の実行に有効であることがすでに確認されています。それにもかかわらず、4D レーダー点群のまばらさとノイズがさらなるパフォーマンスの向上を妨げており、他のモダリティとの融合に関する詳細な研究が不足しています。一方、新しい画像ビュー変換戦略として、「サンプリング」はいくつかの画像ベースの検出器に適用されており、リフト・スプラット・シュート (LSS) で提案され、広く適用されている「深度ベースのスプラッティング」よりも優れていることが示されています。画像深度予測がなくても。しかし、「サンプリング」の可能性が完全に発揮されているわけではありません。この論文では、カメラと 4D イメージングレーダーの融合ベースの 3D オブジェクト検出における「サンプリング」ビュー変換戦略を調査します。提案された LiDAR Excluded Lean (LXL) モデルでは、予測画像深度分布マップとレーダー 3D 占有グリッドが、それぞれ画像透視図 (PV) 特徴とレーダー鳥瞰図 (BEV) 特徴から生成されます。これらは、画像ビューの変換を支援するために、「レーダー占有支援深度ベースサンプリング」と呼ばれる LXL のコアに送信されます。画像深度とレーダー情報を導入すると、「サンプリング」戦略が強化され、より正確なビュー変換が可能になります。 VoD および TJ4DRadSet データセットの実験では、提案された方法が追加機能なしで最先端の 3D オブジェクト検出方法を大幅に上回る性能を示しています。アブレーション研究は、私たちの方法がさまざまな強化設定の中で最高のパフォーマンスを発揮することを実証しています。

As an emerging technology and a relatively affordable device, the 4D imaging radar has already been confirmed effective in performing 3D object detection in autonomous driving. Nevertheless, the sparsity and noisiness of 4D radar point clouds hinder further performance improvement, and in-depth studies about its fusion with other modalities are lacking. On the other hand, as a new image view transformation strategy, "sampling" has been applied in a few image-based detectors and shown to outperform the widely applied "depth-based splatting" proposed in Lift-Splat-Shoot (LSS), even without image depth prediction. However, the potential of "sampling" is not fully unleashed. In this paper, we investigate the "sampling" view transformation strategy on the camera and 4D imaging radar fusion-based 3D object detection. In the proposed LiDAR Excluded Lean (LXL) model, predicted image depth distribution maps and radar 3D occupancy grids are generated from image perspective view (PV) features and radar bird's eye view (BEV) features, respectively. They are sent to the core of LXL, called "radar occupancy-assisted depth-based sampling", to aid image view transformation. Introducing image depths and radar information enhances the "sampling" strategy and leads to more accurate view transformation. Experiments on VoD and TJ4DRadSet datasets show that the proposed method outperforms the state-of-the-art 3D object detection methods by a significant margin without bells and whistles. Ablation studies demonstrate that our method performs the best among different enhancement settings.

updated: Sun Aug 27 2023 12:49:57 GMT+0000 (UTC)

published: Mon Jul 03 2023 03:09:44 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト