3D Object Detection with Pointformer

Xuran Pan; Zhuofan Xia; Shiji Song; Li Erran Li; Gao Huang

Pointformerによる3Dオブジェクト検出

点群データの不規則性のため、点群からの3Dオブジェクト検出の特徴学習は非常に困難です。この論文では、特徴を効果的に学習するために3D点群用に設計されたTransformerバックボーンであるPointformerを提案します。具体的には、Local Transformerモジュールを使用して、ローカル領域内のポイント間の相互作用をモデル化します。これにより、オブジェクトレベルでコンテキスト依存の領域機能が学習されます。 Global Transformerは、シーンレベルでコンテキスト認識表現を学習するように設計されています。マルチスケール表現間の依存関係をさらにキャプチャするために、ローカル機能をより高い解像度のグローバル機能と統合するローカルグローバルトランスフォーマーを提案します。さらに、効率的な座標リファインメントモジュールを導入して、ダウンサンプリングされたポイントをオブジェクトの重心に近づけ、オブジェクトの提案の生成を改善します。最先端のオブジェクト検出モデルのバックボーンとしてPointformerを使用し、屋内と屋外の両方のデータセットで元のモデルよりも大幅に改善されていることを示しています。

Feature learning for 3D object detection from point clouds is very challenging due to the irregularity of 3D point cloud data. In this paper, we propose Pointformer, a Transformer backbone designed for 3D point clouds to learn features effectively. Specifically, a Local Transformer module is employed to model interactions among points in a local region, which learns context-dependent region features at an object level. A Global Transformer is designed to learn context-aware representations at the scene level. To further capture the dependencies among multi-scale representations, we propose Local-Global Transformer to integrate local features with global features from higher resolution. In addition, we introduce an efficient coordinate refinement module to shift down-sampled points closer to object centroids, which improves object proposal generation. We use Pointformer as the backbone for state-of-the-art object detection models and demonstrate significant improvements over original models on both indoor and outdoor datasets.

updated: Mon Jun 21 2021 08:33:35 GMT+0000 (UTC)

published: Mon Dec 21 2020 15:12:54 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト