MSeg3D: Multi-modal 3D Semantic Segmentation for Autonomous Driving

Jiale Li; Hang Dai; Hao Han; Yong Ding

MSeg3D: 自動運転のためのマルチモーダル 3D セマンティックセグメンテーション

LiDAR とカメラは、自動運転における 3D セマンティックセグメンテーションに使用できる 2 つのモダリティです。一般的な LiDAR のみの方法は、レーザーポイントが不十分なため、小さくて遠くのオブジェクトのセグメンテーションが劣るという深刻な問題を抱えていますが、堅牢なマルチモーダルソリューションは未調査であり、モダリティの不均一性、限られたセンサー視野という 3 つの重要な固有の問題を調査しています。交差点、およびマルチモーダルデータ拡張。モダリティの不均一性を軽減するために、モーダル内特徴抽出とモーダル間特徴融合を組み合わせたマルチモーダル 3D セマンティックセグメンテーションモデル (MSeg3D) を提案します。 MSeg3D のマルチモーダルフュージョンは、ジオメトリベースのフィーチャフュージョン GF-Phase、クロスモーダルフィーチャコンプリーション、およびすべての可視ポイントでのセマンティックベースのフィーチャフュージョン SF-Phase で構成されます。マルチモーダルデータ拡張は、LiDAR ポイントクラウドとマルチカメラ画像に非対称変換を個別に適用することで再活性化されます。これは、多様な拡張変換によるモデルトレーニングに役立ちます。 MSeg3D は、nuScenes、Waymo、および SemanticKITTI データセットで最先端の結果を達成します。誤動作しているマルチカメラ入力とマルチフレーム点群入力の下で、MSeg3D は依然として堅牢性を示し、LiDAR のみのベースラインを改善します。私たちのコードは、https://github.com/jialeli1/lidarseg3d で公開されています。

LiDAR and camera are two modalities available for 3D semantic segmentation in autonomous driving. The popular LiDAR-only methods severely suffer from inferior segmentation on small and distant objects due to insufficient laser points, while the robust multi-modal solution is under-explored, where we investigate three crucial inherent difficulties: modality heterogeneity, limited sensor field of view intersection, and multi-modal data augmentation. We propose a multi-modal 3D semantic segmentation model (MSeg3D) with joint intra-modal feature extraction and inter-modal feature fusion to mitigate the modality heterogeneity. The multi-modal fusion in MSeg3D consists of geometry-based feature fusion GF-Phase, cross-modal feature completion, and semantic-based feature fusion SF-Phase on all visible points. The multi-modal data augmentation is reinvigorated by applying asymmetric transformations on LiDAR point cloud and multi-camera images individually, which benefits the model training with diversified augmentation transformations. MSeg3D achieves state-of-the-art results on nuScenes, Waymo, and SemanticKITTI datasets. Under the malfunctioning multi-camera input and the multi-frame point clouds input, MSeg3D still shows robustness and improves the LiDAR-only baseline. Our code is publicly available at https://github.com/jialeli1/lidarseg3d.

updated: Wed Mar 15 2023 13:13:03 GMT+0000 (UTC)

published: Wed Mar 15 2023 13:13:03 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト