Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR-based Perception

Xinge Zhu; Hui Zhou; Tai Wang; Fangzhou Hong; Wei Li; Yuexin Ma; Hongsheng Li; Ruigang Yang; Dahua Lin

LiDARベースの知覚のための円筒形および非対称3D畳み込みネットワーク

運転シーンのLiDARベースの知覚のための最先端の方法（点群のセマンティックセグメンテーション、パノラマセグメンテーション、3D検出など）は、多くの場合、点群を2D空間に投影し、2D畳み込みを介して処理します。この協力は点群における競争力を示していますが、必然的に3Dトポロジーと幾何学的関係を変更し、放棄します。自然な救済策は、3Dボクセル化と3D畳み込みネットワークを利用することです。ただし、屋外の点群では、この方法で得られる改善は非常に限られていることがわかりました。重要な理由は、屋外の点群の特性、つまりスパース性とさまざまな密度です。この調査を動機として、屋外LiDARセグメンテーションの新しいフレームワークを提案します。このフレームワークでは、円筒形のパーティションと非対称の3D畳み込みネットワークが、これらの固有のプロパティを維持しながら3Dの幾何学的パターンを探索するように設計されています。提案されたモデルはバックボーンとして機能し、このモデルから学習した機能は、点群のセマンティックおよびパノラマセグメンテーションや3D検出などのダウンストリームタスクに使用できます。このホワイトペーパーでは、これら3つのタスクについてモデルのベンチマークを行います。セマンティックセグメンテーションでは、提案されたモデルをいくつかの大規模データセット、つまりSemanticKITTI、nuScenes、A2D2で評価します。私たちの方法は、SemanticKITTIのリーダーボード（シングルスキャンとマルチスキャンの両方のチャレンジ）で最先端を実現し、nuScenesおよびA2D2データセットの既存の方法を大幅に上回っています。さらに、提案された3Dフレームワークは、LiDARパノラマセグメンテーションとLiDAR3D検出で強力なパフォーマンスと優れた一般化も示しています。

State-of-the-art methods for driving-scene LiDAR-based perception (including point cloud semantic segmentation, panoptic segmentation and 3D detection, \etc) often project the point clouds to 2D space and then process them via 2D convolution. Although this cooperation shows the competitiveness in the point cloud, it inevitably alters and abandons the 3D topology and geometric relations. A natural remedy is to utilize the 3D voxelization and 3D convolution network. However, we found that in the outdoor point cloud, the improvement obtained in this way is quite limited. An important reason is the property of the outdoor point cloud, namely sparsity and varying density. Motivated by this investigation, we propose a new framework for the outdoor LiDAR segmentation, where cylindrical partition and asymmetrical 3D convolution networks are designed to explore the 3D geometric pattern while maintaining these inherent properties. The proposed model acts as a backbone and the learned features from this model can be used for downstream tasks such as point cloud semantic and panoptic segmentation or 3D detection. In this paper, we benchmark our model on these three tasks. For semantic segmentation, we evaluate the proposed model on several large-scale datasets, i.e. , SemanticKITTI, nuScenes and A2D2. Our method achieves the state-of-the-art on the leaderboard of SemanticKITTI (both single-scan and multi-scan challenge), and significantly outperforms existing methods on nuScenes and A2D2 dataset. Furthermore, the proposed 3D framework also shows strong performance and good generalization on LiDAR panoptic segmentation and LiDAR 3D detection.

updated: Sun Sep 12 2021 06:25:11 GMT+0000 (UTC)

published: Sun Sep 12 2021 06:25:11 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト