Fast Point Transformer

Chunghyun Park; Yoonwoo Jeong; Minsu Cho; Jaesik Park

ファストポイントトランス

ニューラルネットワークの最近の成功により、3D点群のより良い解釈が可能になりましたが、大規模な3Dシーンの処理は依然として困難な問題です。現在のほとんどのアプローチでは、大規模なシーンを小さな領域に分割し、ローカル予測を組み合わせています。ただし、このスキームには必然的に前処理と後処理の追加ステージが含まれ、ローカルの観点からの予測のために最終出力が低下する可能性もあります。この論文では、新しい軽量の自己注意層で構成されるFastPointTransformerを紹介します。私たちのアプローチは連続的な3D座標をエンコードし、ボクセルハッシュベースのアーキテクチャは計算効率を高めます。提案された方法は、3Dセマンティックセグメンテーションと3D検出で示されます。私たちのアプローチの精度は、最高のボクセルベースの方法に匹敵し、私たちのネットワークは、S3DISの3Dセマンティックセグメンテーションにおける妥当な精度のトレードオフで、最先端のポイントトランスフォーマーよりも129倍速い推論時間を達成しますデータセット。

The recent success of neural networks enables a better interpretation of 3D point clouds, but processing a large-scale 3D scene remains a challenging problem. Most current approaches divide a large-scale scene into small regions and combine the local predictions together. However, this scheme inevitably involves additional stages for pre- and post-processing and may also degrade the final output due to predictions in a local perspective. This paper introduces Fast Point Transformer that consists of a new lightweight self-attention layer. Our approach encodes continuous 3D coordinates, and the voxel hashing-based architecture boosts computational efficiency. The proposed method is demonstrated with 3D semantic segmentation and 3D detection. The accuracy of our approach is competitive to the best voxel-based method, and our network achieves 129 times faster inference time than the state-of-the-art, Point Transformer, with a reasonable accuracy trade-off in 3D semantic segmentation on S3DIS dataset.

updated: Mon Apr 04 2022 12:51:48 GMT+0000 (UTC)

published: Thu Dec 09 2021 05:04:10 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト