LXL: LiDAR Excluded Lean 3D Object Detection with 4D Imaging Radar and Camera Fusion

Weiyi Xiong; Jianan Liu; Tao Huang; Qing-Long Han; Yuxuan Xia; Bing Zhu

LXL: 4D イメージングレーダーとカメラフュージョンによる LiDAR 除外の無駄のない 3D 物体検出

4Dイメージングレーダーは、新興技術であり比較的手頃な価格のデバイスであるため、自動運転における3D物体検出の実行に有効であることがすでに確認されています。それにもかかわらず、4D レーダー点群のまばらさとノイズがさらなるパフォーマンスの向上を妨げており、他のモダリティとの融合に関する詳細な研究が不足しています。一方、カメラベースの知覚手法のほとんどは、リフト・スプラット・シュート（LSS）で提案されている「深さに基づくスプラッティング」を介して、抽出された画像の透視図の特徴を鳥瞰図に幾何学的に変換します。 LiDAR や通常の自動車レーダーなどのモーダルを強化します。最近、いくつかの作品が画像ビュー変換に「サンプリング」戦略を適用しており、画像深度予測がない場合でも、それが「スプラッティング」よりも優れていることが示されています。しかし、「サンプリング」の可能性が完全に発揮されているわけではありません。この論文では、カメラと 4D イメージングレーダーの融合ベースの 3D オブジェクト検出における「サンプリング」ビュー変換戦略を調査します。提案されたモデルでは、LXL、予測画像深度分布マップ、およびレーダー 3D 占有グリッドを利用して、「レーダー占有支援深度ベースサンプリング」と呼ばれる画像ビュー変換を支援します。 VoD および TJ4DRadSet データセットの実験では、提案された方法が付加機能なしで既存の 3D オブジェクト検出方法を大幅に上回っていることが示されています。アブレーション研究は、私たちの方法がさまざまな強化設定の中で最高のパフォーマンスを発揮することを実証しています。

As an emerging technology and a relatively affordable device, the 4D imaging radar has already been confirmed effective in performing 3D object detection in autonomous driving. Nevertheless, the sparsity and noisiness of 4D radar point clouds hinder further performance improvement, and in-depth studies about its fusion with other modalities are lacking. On the other hand, most of the camera-based perception methods transform the extracted image perspective view features into the bird's-eye view geometrically via "depth-based splatting" proposed in Lift-Splat-Shoot (LSS), and some researchers exploit other modals such as LiDARs or ordinary automotive radars for enhancement. Recently, a few works have applied the "sampling" strategy for image view transformation, showing that it outperforms "splatting" even without image depth prediction. However, the potential of "sampling" is not fully unleashed. In this paper, we investigate the "sampling" view transformation strategy on the camera and 4D imaging radar fusion-based 3D object detection. In the proposed model, LXL, predicted image depth distribution maps and radar 3D occupancy grids are utilized to aid image view transformation, called "radar occupancy-assisted depth-based sampling". Experiments on VoD and TJ4DRadSet datasets show that the proposed method outperforms existing 3D object detection methods by a significant margin without bells and whistles. Ablation studies demonstrate that our method performs the best among different enhancement settings.

updated: Fri Jul 07 2023 13:20:59 GMT+0000 (UTC)

published: Mon Jul 03 2023 03:09:44 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト