3D Object Detection Combining Semantic and Geometric Features from Point Clouds

Hao Peng; Guofeng Tong; Zheng Li; Yaqi Wang; Yuyuan Shao

点群からの意味論的特徴と幾何学的特徴を組み合わせた3Dオブジェクト検出

本論文では、ボクセルベースの方法とポイントベースの方法の組み合わせを調査し、点群シーン用のSGNetという名前の新しいエンドツーエンドの2ステージ3Dオブジェクト検出器を提案します。ボクセルベースの方法は、シーンを通常のグリッドにボクセル化します。これは、セマンティック特徴学習の畳み込み層に基づく現在の高度な特徴学習フレームワークで処理できます。一方、ポイントベースの方法では、座標の予約により、ポイントの幾何学的特徴をより適切に抽出できます。この2つの組み合わせは、点群からの3Dオブジェクト検出に効果的なソリューションです。ただし、現在のほとんどの方法では、最終的な分類と位置特定のために、アンカー付きのボクセルベースの検出ヘッドを使用しています。プリセットアンカーはシーン全体をカバーしますが、ボクセルサイズの制限により、より大きなシーンや複数のカテゴリを含む点群検出タスクには適していません。この論文では、意味論的および幾何学的特徴をキャプチャするボクセルツーポイントモジュール（VTPM）を提案します。 VTPMは、ボクセルポイントベースのモジュールであり、最終的にポイント空間で3Dオブジェクト検出を実装します。これにより、小さなサイズのオブジェクトの検出がより容易になり、推論段階でのアンカーのプリセットが回避されます。さらに、中心境界を意識した信頼性の注意を備えた信頼性調整モジュール（CAM）が提案され、関心のある領域（RoI）の選択における予測された信頼性と提案の間の不整合を解決します。この論文で提案されたSGNetは、KITTIデータセットでの3Dオブジェクト検出、特にサイクリストなどの小さなサイズのオブジェクトの検出で最先端の結果を達成しました。実際、2021年9月19日の時点で、KITTIデータセットでは、SGNetは難易度の高いサイクリストの3DおよびBEV検出で1位、中程度のサイクリストの3D検出で2位にランクされています。

In this paper, we investigate the combination of voxel-based methods and point-based methods, and propose a novel end-to-end two-stage 3D object detector named SGNet for point clouds scenes. The voxel-based methods voxelize the scene to regular grids, which can be processed with the current advanced feature learning frameworks based on convolutional layers for semantic feature learning. Whereas the point-based methods can better extract the geometric feature of the point due to the coordinate reservations. The combination of the two is an effective solution for 3D object detection from point clouds. However, most current methods use a voxel-based detection head with anchors for final classification and localization. Although the preset anchors cover the entire scene, it is not suitable for point clouds detection tasks with larger scenes and multiple categories due to the limitation of voxel size. In this paper, we propose a voxel-to-point module (VTPM) that captures semantic and geometric features. The VTPM is a Voxel-Point-Based Module that finally implements 3D object detection in point space, which is more conducive to the detection of small-size objects and avoids the presets of anchors in inference stage. In addition, a Confidence Adjustment Module (CAM) with the center-boundary-aware confidence attention is proposed to solve the misalignment between the predicted confidence and proposals in the regions of the interest (RoI) selection. The SGNet proposed in this paper has achieved state-of-the-art results for 3D object detection in the KITTI dataset, especially in the detection of small-size objects such as cyclists. Actually, as of September 19, 2021, for KITTI dataset, SGNet ranked 1st in 3D and BEV detection on cyclists with easy difficulty level, and 2nd in the 3D detection of moderate cyclists.

updated: Sun Oct 10 2021 04:43:27 GMT+0000 (UTC)

published: Sun Oct 10 2021 04:43:27 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト