Similarity-Aware Fusion Network for 3D Semantic Segmentation

Linqing Zhao; Jiwen Lu; Jie Zhou

3Dセマンティックセグメンテーションのための類似性を意識した融合ネットワーク

本論文では、3Dセマンティックセグメンテーションのために2D画像と3D点群を適応的に融合する類似性認識融合ネットワーク（SAFNet）を提案します。既存の融合ベースの方法は、複数のモダリティからの情報を統合することにより、驚くべきパフォーマンスを実現します。ただし、投影による2Dピクセルと3Dポイントの対応に大きく依存しており、固定された方法でしか情報融合を実行できないため、収集されたデータに厳密なペアがないことが多い、より現実的なシナリオにパフォーマンスを簡単に移行することはできません。予測のための賢明な機能。これに対処するために、入力と逆投影された（2Dピクセルからの）点群の間の幾何学的および文脈上の類似性を最初に学習し、それらを利用して2つのモダリティの融合を導き、補完的な情報をさらに活用する後期融合戦略を採用します。具体的には、幾何学的類似性モジュール（GSM）を使用して、ペアワイズ3D近隣の空間座標分布を直接比較し、コンテキスト類似性モジュール（CSM）を使用して、対応する中心点の空間コンテキスト情報を集約および比較します。提案された2つのモジュールは、画像の特徴が予測にどの程度役立つかを効果的に測定できるため、ネットワークは各ポイントの最終予測に対する2つのモダリティの寄与を適応的に調整できます。 ScanNetV2ベンチマークの実験結果は、SAFNetが、さまざまなデータ整合性全体で既存の最先端の融合ベースのアプローチを大幅に上回っていることを示しています。

In this paper, we propose a similarity-aware fusion network (SAFNet) to adaptively fuse 2D images and 3D point clouds for 3D semantic segmentation. Existing fusion-based methods achieve remarkable performances by integrating information from multiple modalities. However, they heavily rely on the correspondence between 2D pixels and 3D points by projection and can only perform the information fusion in a fixed manner, and thus their performances cannot be easily migrated to a more realistic scenario where the collected data often lack strict pair-wise features for prediction. To address this, we employ a late fusion strategy where we first learn the geometric and contextual similarities between the input and back-projected (from 2D pixels) point clouds and utilize them to guide the fusion of two modalities to further exploit complementary information. Specifically, we employ a geometric similarity module (GSM) to directly compare the spatial coordinate distributions of pair-wise 3D neighborhoods, and a contextual similarity module (CSM) to aggregate and compare spatial contextual information of corresponding central points. The two proposed modules can effectively measure how much image features can help predictions, enabling the network to adaptively adjust the contributions of two modalities to the final prediction of each point. Experimental results on the ScanNetV2 benchmark demonstrate that SAFNet significantly outperforms existing state-of-the-art fusion-based approaches across various data integrity.

updated: Sat Jul 17 2021 05:25:11 GMT+0000 (UTC)

published: Sun Jul 04 2021 09:28:18 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト