FIDNet: LiDAR Point Cloud Semantic Segmentation with Fully Interpolation Decoding

Yiming Zhao; Lin Bai; Xinming Huang

FIDNet：完全補間デコードを使用したLiDARポイントクラウドセマンティックセグメンテーション

2D球面範囲画像に点群を投影すると、LiDARセマンティックセグメンテーションが範囲画像の2Dセグメンテーションタスクに変換されます。ただし、LiDAR範囲画像は、通常の2DRGB画像とは当然異なります。たとえば、距離画像の各位置は、一意のジオメトリ情報をエンコードします。この論文では、新しいネットワーク構造と効率的な後処理ステップで構成される、新しいプロジェクションベースのLiDARセマンティックセグメンテーションパイプラインを提案します。ネットワーク構造では、双一次内挿を使用して多重解像度特徴マップを直接アップサンプリングするFID（完全内挿デコード）モジュールを設計します。 PointNet ++で使用される3D距離補間に触発されて、このFIDモジュールは（θ、ϕ）空間での2Dバージョンの距離補間であると主張します。パラメータのないデコードモジュールとして、FIDは優れたパフォーマンスを維持することにより、モデルの複雑さを大幅に軽減します。ネットワーク構造に加えて、モデル予測には異なるセマンティッククラス間に明確な境界があることが経験的にわかりました。これにより、広く使用されているK最近傍後処理がパイプラインにまだ必要かどうかを再考することができます。次に、多対1のマッピングにより、一部のポイントが同じピクセルにマッピングされ、同じラベルを共有するというぼかし効果が発生することがわかります。したがって、最も近い予測ラベルをそれらに割り当てることによって、それらの遮蔽されたポイントを処理することを提案します。このNLA（最も近いラベル割り当て）後処理ステップは、アブレーション研究でより速い推論速度でKNNよりも優れたパフォーマンスを示します。 SemanticKITTIデータセットでは、パイプラインは、64×2048の解像度とすべてのポイントワイズソリューションを使用して、すべてのプロジェクションベースのメソッドの中で最高のパフォーマンスを実現します。 ResNet-34をバックボーンとして使用すると、モデルのトレーニングとテストの両方を、11Gメモリを搭載した単一のRTX 2080Tiで終了できます。コードがリリースされました。

Projecting the point cloud on the 2D spherical range image transforms the LiDAR semantic segmentation to a 2D segmentation task on the range image. However, the LiDAR range image is still naturally different from the regular 2D RGB image; for example, each position on the range image encodes the unique geometry information. In this paper, we propose a new projection-based LiDAR semantic segmentation pipeline that consists of a novel network structure and an efficient post-processing step. In our network structure, we design a FID (fully interpolation decoding) module that directly upsamples the multi-resolution feature maps using bilinear interpolation. Inspired by the 3D distance interpolation used in PointNet++, we argue this FID module is a 2D version distance interpolation on (θ, ϕ) space. As a parameter-free decoding module, the FID largely reduces the model complexity by maintaining good performance. Besides the network structure, we empirically find that our model predictions have clear boundaries between different semantic classes. This makes us rethink whether the widely used K-nearest-neighbor post-processing is still necessary for our pipeline. Then, we realize the many-to-one mapping causes the blurring effect that some points are mapped into the same pixel and share the same label. Therefore, we propose to process those occluded points by assigning the nearest predicted label to them. This NLA (nearest label assignment) post-processing step shows a better performance than KNN with faster inference speed in the ablation study. On the SemanticKITTI dataset, our pipeline achieves the best performance among all projection-based methods with 64 ×2048 resolution and all point-wise solutions. With a ResNet-34 as the backbone, both the training and testing of our model can be finished on a single RTX 2080 Ti with 11G memory. The code is released.

updated: Wed Sep 08 2021 17:20:09 GMT+0000 (UTC)

published: Wed Sep 08 2021 17:20:09 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト