LidarMultiNet: Towards a Unified Multi-task Network for LiDAR Perception

Dongqiangzi Ye; Zixiang Zhou; Weijia Chen; Yufei Xie; Yu Wang; Panqu Wang; Hassan Foroosh

LidarMultiNet: LiDAR 知覚のための統合マルチタスクネットワークに向けて

LiDAR ベースの 3D オブジェクト検出、セマンティックセグメンテーション、パノプティックセグメンテーションは、通常、相互に適応するのが難しい独特のアーキテクチャを持つ特殊なネットワークに実装されます。このホワイトペーパーでは、これら 3 つの主要な LiDAR 認識タスクを統合する LiDAR ベースのマルチタスクネットワークである LidarMultiNet について説明します。その多くの利点の中でも、マルチタスクネットワークは、重みと計算を複数のタスク間で共有することにより、全体的なコストを削減できます。ただし、通常、独立して結合された単一タスクモデルと比較するとパフォーマンスが低下します。提案された LidarMultiNet は、マルチタスクネットワークと複数のシングルタスクネットワークの間のパフォーマンスギャップを埋めることを目的としています。 LidarMultiNet の中核は、LiDAR フレームからグローバルコンテキスト機能を抽出するグローバルコンテキストプーリング (GCP) モジュールを備えた強力な 3D ボクセルベースのエンコーダー/デコーダーアーキテクチャです。タスク固有のヘッドがネットワークの上に追加され、3 つの LiDAR 認識タスクを実行します。新しいタスク固有のヘッドを追加するだけで、追加コストをほとんど発生させずに、より多くのタスクを実装できます。第 1 段階のセグメンテーションを改良し、正確なパノプティックセグメンテーション結果を生成するために、第 2 段階も提案されています。 LidarMultiNet は、Waymo Open Dataset と nuScenes データセットの両方で広範囲にテストされており、主要な LiDAR 認識タスクを、エンドツーエンドでトレーニングされ、最先端のパフォーマンスを実現する単一の強力なネットワークに統合できることを初めて実証しています。特に、LidarMultiNet は、Waymo Open Dataset 3D セマンティックセグメンテーションチャレンジ 2022 で、LiDAR ポイントのみを入力として使用して、テストセットの 22 クラスのほとんどで最高の mIoU と最高の精度を達成し、公式の 1 位になりました。また、Waymo の 3D オブジェクト検出ベンチマークと 3 つの nuScenes ベンチマークで、単一モデルの新しい最先端技術を設定します。

LiDAR-based 3D object detection, semantic segmentation, and panoptic segmentation are usually implemented in specialized networks with distinctive architectures that are difficult to adapt to each other. This paper presents LidarMultiNet, a LiDAR-based multi-task network that unifies these three major LiDAR perception tasks. Among its many benefits, a multi-task network can reduce the overall cost by sharing weights and computation among multiple tasks. However, it typically underperforms compared to independently combined single-task models. The proposed LidarMultiNet aims to bridge the performance gap between the multi-task network and multiple single-task networks. At the core of LidarMultiNet is a strong 3D voxel-based encoder-decoder architecture with a Global Context Pooling (GCP) module extracting global contextual features from a LiDAR frame. Task-specific heads are added on top of the network to perform the three LiDAR perception tasks. More tasks can be implemented simply by adding new task-specific heads while introducing little additional cost. A second stage is also proposed to refine the first-stage segmentation and generate accurate panoptic segmentation results. LidarMultiNet is extensively tested on both Waymo Open Dataset and nuScenes dataset, demonstrating for the first time that major LiDAR perception tasks can be unified in a single strong network that is trained end-to-end and achieves state-of-the-art performance. Notably, LidarMultiNet reaches the official 1st place in the Waymo Open Dataset 3D semantic segmentation challenge 2022 with the highest mIoU and the best accuracy for most of the 22 classes on the test set, using only LiDAR points as input. It also sets the new state-of-the-art for a single model on the Waymo 3D object detection benchmark and three nuScenes benchmarks.

updated: Mon Sep 19 2022 23:39:15 GMT+0000 (UTC)

published: Mon Sep 19 2022 23:39:15 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト