S3Net: 3D LiDAR Sparse Semantic Segmentation Network

Ran Cheng; Ryan Razani; Yuan Ren; Liu Bingbing

S3Net：3DLiDARスパースセマンティックセグメンテーションネットワーク

セマンティックセグメンテーションは、正確な環境の認識と理解に依存するロボット工学や自動運転など、多くのアプリケーションの認識システムにおける重要なコンポーネントです。文献では、投影ベース（範囲ビューまたは鳥瞰図）やボクセルベースのアプローチなど、LiDARセマンティックセグメンテーションタスクを試行するためのいくつかのアプローチが導入されています。ただし、それらは貴重な3Dトポロジーと幾何学的関係を放棄し、投影プロセスで導入された情報損失に悩まされるか、非効率的です。したがって、3D空間で3D運転シーンの点群を処理できる正確なモデルが必要です。この論文では、LiDARポイントクラウドセマンティックセグメンテーションのための新しい畳み込みニューラルネットワークであるS3Netを提案します。スパースチャネル内アテンションモジュール（SIntraAM）とスパースチャネル間アテンションモジュール（SInterAM）で構成されるエンコーダ-デコーダバックボーンを採用して、各フィーチャマップ内と近くのフィーチャマップ間の両方の詳細を強調します。より深い層でグローバルコンテキストを抽出するために、LiDARポイントクラウドのさまざまなスパース性に適合するスパース畳み込みに基づくスパース残差タワーを導入します。さらに、地理認識の異方性損失を利用して、セマンティック境界を強調し、予測された各領域内のノイズにペナルティを課し、堅牢な予測を実現します。私たちの実験結果は、提案された方法がSemanticKITTI DBLP：conf / iccv / BehleyGMQBSG19テストセットのベースライン対応物（MinkNet42 choy20194d）と比較して大幅な改善（12％）をもたらし、セマンティックの最先端のmIoU精度を達成することを示していますセグメンテーションアプローチ。

Semantic Segmentation is a crucial component in the perception systems of many applications, such as robotics and autonomous driving that rely on accurate environmental perception and understanding. In literature, several approaches are introduced to attempt LiDAR semantic segmentation task, such as projection-based (range-view or birds-eye-view), and voxel-based approaches. However, they either abandon the valuable 3D topology and geometric relations and suffer from information loss introduced in the projection process or are inefficient. Therefore, there is a need for accurate models capable of processing the 3D driving-scene point cloud in 3D space. In this paper, we propose S3Net, a novel convolutional neural network for LiDAR point cloud semantic segmentation. It adopts an encoder-decoder backbone that consists of Sparse Intra-channel Attention Module (SIntraAM), and Sparse Inter-channel Attention Module (SInterAM) to emphasize the fine details of both within each feature map and among nearby feature maps. To extract the global contexts in deeper layers, we introduce Sparse Residual Tower based upon sparse convolution that suits varying sparsity of LiDAR point cloud. In addition, geo-aware anisotrophic loss is leveraged to emphasize the semantic boundaries and penalize the noise within each predicted regions, leading to a robust prediction. Our experimental results show that the proposed method leads to a large improvement (12%) compared to its baseline counterpart (MinkNet42 choy20194d) on SemanticKITTI DBLP:conf/iccv/BehleyGMQBSG19 test set and achieves state-of-the-art mIoU accuracy of semantic segmentation approaches.

updated: Mon Mar 15 2021 22:15:24 GMT+0000 (UTC)

published: Mon Mar 15 2021 22:15:24 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト