PV-RCNN++: Semantical Point-Voxel Feature Interaction for 3D Object Detection

Peng Wu; Lipeng Gu; Xuefeng Yan; Haoran Xie; Fu Lee Wang; Gary Cheng; Mingqiang Wei

PV-RCNN++: 3D オブジェクト検出のためのセマンティカルポイント - ボクセルフィーチャインタラクション

屋外の LiDAR ポイントクラウドでは、前景のポイント (つまり、オブジェクト) と背景のポイントの間に大きな不均衡が存在することがよくあります。これは、最先端の検出器が情報領域に焦点を合わせて正確な 3D オブジェクト検出結果を生成するのを妨げます。この論文では、PV-RCNN++ と呼ばれる、意味論的な点とボクセルの特徴の相互作用による新しいオブジェクト検出ネットワークを提案します。既存のほとんどの方法とは異なり、PV-RCNN++ はセマンティック情報を調査して、オブジェクト検出の品質を向上させます。まず、セマンティックセグメンテーションモジュールは、より識別可能な前景キーポイントを保持するために提案されています。このようなモジュールは、PV-RCNN++ をガイドして、重要な領域でより多くのオブジェクト関連のポイント単位およびボクセル単位の機能を統合します。次に、ポイントとボクセルを効率的に相互作用させるために、マンハッタン距離に基づくボクセルクエリを使用して、キーポイント周辺のボクセル単位の特徴をすばやくサンプリングします。このようなボクセルクエリは、ボールクエリと比較して、時間の複雑さを O(N) から O(K) に削減します。さらに、局所的な特徴のみを学習することに行き詰まるのを避けるために、アテンションベースの残差 PointNet モジュールは、受容野を拡張して隣接するボクセル単位の特徴をキーポイントに適応的に集約するように設計されています。 KITTI データセットでの広範な実験では、PV-RCNN++ が自動車、歩行者、サイクリストで 81.60%、40.18%、68.21% の 3D mAP を達成し、最先端技術と同等またはそれ以上のパフォーマンスを達成することが示されています。

Large imbalance often exists between the foreground points (i.e., objects) and the background points in outdoor LiDAR point clouds. It hinders cutting-edge detectors from focusing on informative areas to produce accurate 3D object detection results. This paper proposes a novel object detection network by semantical point-voxel feature interaction, dubbed PV-RCNN++. Unlike most of existing methods, PV-RCNN++ explores the semantic information to enhance the quality of object detection. First, a semantic segmentation module is proposed to retain more discriminative foreground keypoints. Such a module will guide our PV-RCNN++ to integrate more object-related point-wise and voxel-wise features in the pivotal areas. Then, to make points and voxels interact efficiently, we utilize voxel query based on Manhattan distance to quickly sample voxel-wise features around keypoints. Such the voxel query will reduce the time complexity from O(N) to O(K), compared to the ball query. Further, to avoid being stuck in learning only local features, an attention-based residual PointNet module is designed to expand the receptive field to adaptively aggregate the neighboring voxel-wise features into keypoints. Extensive experiments on the KITTI dataset show that PV-RCNN++ achieves 81.60%, 40.18%, 68.21% 3D mAP on Car, Pedestrian, and Cyclist, achieving comparable or even better performance to the state-of-the-arts.

updated: Mon Aug 29 2022 08:14:00 GMT+0000 (UTC)

published: Mon Aug 29 2022 08:14:00 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト