Delivering Arbitrary-Modal Semantic Segmentation

Jiaming Zhang; Ruiping Liu; Hao Shi; Kailun Yang; Simon Reiß; Kunyu Peng; Haodong Fu; Kaiwei Wang; Rainer Stiefelhagen

任意モーダルセマンティックセグメンテーションの実現

マルチモーダルフュージョンは、セマンティックセグメンテーションをより堅牢にすることができます。ただし、任意の数のモダリティを融合することは未調査のままです。この問題を掘り下げるために、Depth、LiDAR、複数のビュー、イベント、および RGB をカバーする DeLiVER 任意モーダルセグメンテーションベンチマークを作成します。これとは別に、モーダル補完性を活用して部分的な停止を解決するために、このデータセットを 4 つの厳しい気象条件と 5 つのセンサー障害ケースで提供します。これを可能にするために、任意のクロスモーダルセグメンテーションモデル CMNeXt を提示します。これには、任意のモダリティから効果的な情報を抽出してその後の RGB 表現との融合を行うように設計された Self-Query Hub (SQ-Hub) が含まれ、追加のモダリティごとにごくわずかな量のパラメーター (~0.01M) しか追加されません。その上で、補助モダリティから識別キューを効率的かつ柔軟に収集するために、単純な並列プーリングミキサー (PPX) を導入します。合計 6 つのベンチマークで広範な実験を行った当社の CMNeXt は、DeLiVER、KITTI-360、MFNet、NYU Depth V2、UrbanLF、および MCubeS データセットで最先端のパフォーマンスを達成し、1 から 81 のモダリティまでスケーリングできます。新たに収集された DeLiVER では、クアッドモーダル CMNeXt は mIoU で最大 66.30% に達し、モノモーダルベースラインと比較して +9.10% の増加です。 DeLiVER データセットとコードは https://jamycheung.github.io/DELIVER.html にあります。

Multimodal fusion can make semantic segmentation more robust. However, fusing an arbitrary number of modalities remains underexplored. To delve into this problem, we create the DeLiVER arbitrary-modal segmentation benchmark, covering Depth, LiDAR, multiple Views, Events, and RGB. Aside from this, we provide this dataset in four severe weather conditions as well as five sensor failure cases to exploit modal complementarity and resolve partial outages. To make this possible, we present the arbitrary cross-modal segmentation model CMNeXt. It encompasses a Self-Query Hub (SQ-Hub) designed to extract effective information from any modality for subsequent fusion with the RGB representation and adds only negligible amounts of parameters (~0.01M) per additional modality. On top, to efficiently and flexibly harvest discriminative cues from the auxiliary modalities, we introduce the simple Parallel Pooling Mixer (PPX). With extensive experiments on a total of six benchmarks, our CMNeXt achieves state-of-the-art performance on the DeLiVER, KITTI-360, MFNet, NYU Depth V2, UrbanLF, and MCubeS datasets, allowing to scale from 1 to 81 modalities. On the freshly collected DeLiVER, the quad-modal CMNeXt reaches up to 66.30% in mIoU with a +9.10% gain as compared to the mono-modal baseline. The DeLiVER dataset and our code are at: https://jamycheung.github.io/DELIVER.html.

updated: Thu Mar 02 2023 18:41:41 GMT+0000 (UTC)

published: Thu Mar 02 2023 18:41:41 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト