Sparse Cross-scale Attention Network for Efficient LiDAR Panoptic Segmentation

Shuangjie Xu; Rui Wan; Maosheng Ye; Xiaoyi Zou; Tongyi Cao

効率的なLiDARパノプティコンセグメンテーションのためのスパースクロススケールアテンションネットワーク

3D LiDARパノプティコンセグメンテーション（PS）の2つの主要な課題は、オブジェクトの点群が表面に集約されているため、特に大規模なインスタンスの場合、長距離の依存関係をモデル化するのが難しいことと、オブジェクトが近すぎて互いに分離できないことです。最近の文献では、デュアルクラスタリング、平均シフトオフセットなどの時間のかかるグループ化プロセスによって、またはジオメトリを軽視する鳥瞰図（BEV）の高密度重心表現によって、これらの問題に対処しています。ただし、長距離のジオメトリの関係は、上記の方法から学習した局所的な特徴によって十分にモデル化されていません。この目的のために、SCANを紹介します。これは、マルチスケールのスパース特徴をグローバルなボクセルエンコードされたアテンションと最初に整列させて、インスタンスコンテキストの長距離関係をキャプチャするための、新しいスパースクロススケールアテンションネットワークです。 -セグメント化された大きなオブジェクト。サーフェス集約ポイントの場合、SCANは、インスタンスセントロイドの新しいスパースクラスに依存しない表現を採用します。これにより、整列されたフィーチャのスパース性を維持して、小さなオブジェクトのアンダーセグメンテーションを解決できるだけでなく、ネットワークの計算量を削減できます。スパース畳み込み。私たちの方法は、挑戦的な3D PSタスクのSemanticKITTIデータセットで以前の方法を大幅に上回り、リアルタイムの推論速度で1位を達成しました。

Two major challenges of 3D LiDAR Panoptic Segmentation (PS) are that point clouds of an object are surface-aggregated and thus hard to model the long-range dependency especially for large instances, and that objects are too close to separate each other. Recent literature addresses these problems by time-consuming grouping processes such as dual-clustering, mean-shift offsets, etc., or by bird-eye-view (BEV) dense centroid representation that downplays geometry. However, the long-range geometry relationship has not been sufficiently modeled by local feature learning from the above methods. To this end, we present SCAN, a novel sparse cross-scale attention network to first align multi-scale sparse features with global voxel-encoded attention to capture the long-range relationship of instance context, which can boost the regression accuracy of the over-segmented large objects. For the surface-aggregated points, SCAN adopts a novel sparse class-agnostic representation of instance centroids, which can not only maintain the sparsity of aligned features to solve the under-segmentation on small objects, but also reduce the computation amount of the network through sparse convolution. Our method outperforms previous methods by a large margin in the SemanticKITTI dataset for the challenging 3D PS task, achieving 1st place with a real-time inference speed.

updated: Sun Jan 16 2022 05:34:54 GMT+0000 (UTC)

published: Sun Jan 16 2022 05:34:54 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト