Efficient 3D Semantic Segmentation with Superpoint Transformer

Damien Robert; Hugo Raguet; Loic Landrieu

Superpoint Transformer による効率的な 3D セマンティックセグメンテーション

大規模な 3D シーンの効率的なセマンティックセグメンテーションのために、新しいスーパーポイントベースのトランスフォーマーアーキテクチャを導入します。私たちの方法には、点群を階層的なスーパーポイント構造に分割する高速アルゴリズムが組み込まれており、これにより前処理が既存のスーパーポイントベースのアプローチより 7 倍高速になります。さらに、セルフアテンションメカニズムを活用して、複数のスケールでスーパーポイント間の関係をキャプチャし、S3DIS (76.0% mIoU 6 倍検証)、KITTI-360 ( Val では 63.5%)、DALES (79.6%)。わずか 212,000 個のパラメーターを使用するこのアプローチは、同様のパフォーマンスを維持しながら、他の最先端のモデルよりも最大 200 倍コンパクトです。さらに、私たちのモデルは、S3DIS データセットの 1 倍に対して 3 時間で単一の GPU でトレーニングできます。これは、最もパフォーマンスの高い方法と比べて GPU 時間が 7 倍から 70 倍少なくなります。私たちのコードとモデルは、github.com/drprojects/superpoint_transformer からアクセスできます。

We introduce a novel superpoint-based transformer architecture for efficient semantic segmentation of large-scale 3D scenes. Our method incorporates a fast algorithm to partition point clouds into a hierarchical superpoint structure, which makes our preprocessing 7 times faster than existing superpoint-based approaches. Additionally, we leverage a self-attention mechanism to capture the relationships between superpoints at multiple scales, leading to state-of-the-art performance on three challenging benchmark datasets: S3DIS (76.0% mIoU 6-fold validation), KITTI-360 (63.5% on Val), and DALES (79.6%). With only 212k parameters, our approach is up to 200 times more compact than other state-of-the-art models while maintaining similar performance. Furthermore, our model can be trained on a single GPU in 3 hours for a fold of the S3DIS dataset, which is 7x to 70x fewer GPU-hours than the best-performing methods. Our code and models are accessible at github.com/drprojects/superpoint_transformer.

updated: Sat Aug 12 2023 09:55:56 GMT+0000 (UTC)

published: Tue Jun 13 2023 18:03:05 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト