Virtual Sparse Convolution for Multimodal 3D Object Detection

Hai Wu; Chenglu Wen; Shaoshuai Shi; Xin Li; Cheng Wang

マルチモーダル 3D オブジェクト検出のための仮想スパース畳み込み

最近では、深度補完によって RGB 画像と LiDAR データをシームレスに融合する仮想/疑似点ベースの 3D オブジェクト検出が大きな注目を集めています。ただし、画像から生成された仮想ポイントは非常に高密度であるため、検出中に膨大な量の冗長な計算が発生します。一方、不正確な深度補完によってもたらされるノイズは、検出精度を大幅に低下させます。この論文では、仮想点ベースの 3D オブジェクト検出のために、新しい演算子 VirConv (Virtual Sparse Convolution) に基づいて、VirConvNet と呼ばれる高速かつ効果的なバックボーンを提案します。 VirConv は、(1) StVD (確率的ボクセル破棄) と (2) NRConv (ノイズ耐性部分多様体畳み込み) の 2 つの主要な設計で構成されています。 StVD は、大量の近くの冗長なボクセルを破棄することにより、計算の問題を軽減します。 NRConv は、2D 画像と 3D LiDAR 空間の両方でボクセル機能をエンコードすることにより、ノイズの問題に取り組みます。 VirConv を統合することにより、最初に初期の融合設計に基づいて効率的なパイプライン VirConv-L を開発します。次に、変換された改良スキームに基づいて、高精度のパイプライン VirConv-T を構築します。最後に、疑似ラベルフレームワークに基づく半教師付きパイプライン VirConv-S を開発します。 KITTI の自動車 3D 検出テストのリーダーボードで、当社の VirConv-L は 56ms の高速実行速度で 85% の AP を達成しています。当社の VirConv-T と VirConv-S は、86.3% と 87.2% AP の高精度を達成し、現在、それぞれ 2 位と 1 位にランクされています。コードは https://github.com/hailanyi/VirConv で入手できます。

Recently, virtual/pseudo-point-based 3D object detection that seamlessly fuses RGB images and LiDAR data by depth completion has gained great attention. However, virtual points generated from an image are very dense, introducing a huge amount of redundant computation during detection. Meanwhile, noises brought by inaccurate depth completion significantly degrade detection precision. This paper proposes a fast yet effective backbone, termed VirConvNet, based on a new operator VirConv (Virtual Sparse Convolution), for virtual-point-based 3D object detection. VirConv consists of two key designs: (1) StVD (Stochastic Voxel Discard) and (2) NRConv (Noise-Resistant Submanifold Convolution). StVD alleviates the computation problem by discarding large amounts of nearby redundant voxels. NRConv tackles the noise problem by encoding voxel features in both 2D image and 3D LiDAR space. By integrating VirConv, we first develop an efficient pipeline VirConv-L based on an early fusion design. Then, we build a high-precision pipeline VirConv-T based on a transformed refinement scheme. Finally, we develop a semi-supervised pipeline VirConv-S based on a pseudo-label framework. On the KITTI car 3D detection test leaderboard, our VirConv-L achieves 85% AP with a fast running speed of 56ms. Our VirConv-T and VirConv-S attains a high-precision of 86.3% and 87.2% AP, and currently rank 2nd and 1st, respectively. The code is available at https://github.com/hailanyi/VirConv.

updated: Sat Mar 04 2023 04:15:36 GMT+0000 (UTC)

published: Sat Mar 04 2023 04:15:36 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト