Learning Spatial and Temporal Variations for 4D Point Cloud Segmentation

Shi Hanyu; Wei Jiacheng; Wang Hao; Liu Fayao; Lin Guosheng

4D点群セグメンテーションのための空間的および時間的変動の学習

LiDARベースの3Dシーン認識は、自動運転の基本的かつ重要なタスクです。 LiDARベースの3D認識タスクのほとんどの最先端の方法は、単一フレームの3D点群データに焦点を合わせており、これらの方法では時間情報は無視されます。フレーム全体の時間情報は、特に運転シナリオにおいて、3Dシーンの知覚に重要な知識を提供すると主張します。このホワイトペーパーでは、3Dフレーム全体の時間情報をより適切に調査するために、空間的および時間的変動に焦点を当てます。 4Dポイントクラウドの時間的変化をキャプチャするために、時間的変化を認識する補間モジュールと時間的ボクセルポイントリファイナーを設計します。時間的変動を意識した補間は、空間的コヒーレンスと時間的変動の情報をキャプチャすることにより、前のフレームと現在のフレームからローカルな特徴を生成します。時間ボクセルポイントリファイナーは、3Dポイントクラウドシーケンス上に時間グラフを作成し、グラフ畳み込みモジュールを使用して時間変化をキャプチャします。時間的ボクセルポイントリファイナーは、粗いボクセルレベルの予測を細かいポイントレベルの予測に変換します。提案されたモジュールを使用して、新しいネットワークTVSNは、SemanticKITTIおよびSemantiPOSSで最先端のパフォーマンスを実現します。具体的には、私たちの方法は、SemanticKITTIのマルチスキャンセグメンテーションタスクでmIoUで52.5％（以前の最良のアプローチに対して+ 5.5％）、SemanticPOSSで63.0％（以前の最良のアプローチに対して+ 2.8％）を達成します。

LiDAR-based 3D scene perception is a fundamental and important task for autonomous driving. Most state-of-the-art methods on LiDAR-based 3D recognition tasks focus on single frame 3D point cloud data, and the temporal information is ignored in those methods. We argue that the temporal information across the frames provides crucial knowledge for 3D scene perceptions, especially in the driving scenario. In this paper, we focus on spatial and temporal variations to better explore the temporal information across the 3D frames. We design a temporal variation-aware interpolation module and a temporal voxel-point refiner to capture the temporal variation in the 4D point cloud. The temporal variation-aware interpolation generates local features from the previous and current frames by capturing spatial coherence and temporal variation information. The temporal voxel-point refiner builds a temporal graph on the 3D point cloud sequences and captures the temporal variation with a graph convolution module. The temporal voxel-point refiner also transforms the coarse voxel-level predictions into fine point-level predictions. With our proposed modules, the new network TVSN achieves state-of-the-art performance on SemanticKITTI and SemantiPOSS. Specifically, our method achieves 52.5% in mIoU (+5.5% against previous best approaches) on the multiple scan segmentation task on SemanticKITTI, and 63.0% on SemanticPOSS (+2.8% against previous best approaches).

updated: Mon Jul 11 2022 07:36:26 GMT+0000 (UTC)

published: Mon Jul 11 2022 07:36:26 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト