Self-Attention Based Context-Aware 3D Object Detection

Prarthana Bhattacharyya; Chengjie Huang; Krzysztof Czarnecki

自己注意ベースのコンテキストアウェア3Dオブジェクト検出

ほとんどの既存の点群ベースの3Dオブジェクト検出器は、畳み込みのような演算子を使用して、固定重みカーネルを使用してローカルネイバーフッドの情報を処理し、グローバルコンテキストを階層的に集約します。ただし、非ローカルニューラルネットワークと2Dビジョンの自己注意に関する最近の研究では、グローバルコンテキストと位置間の長距離相互作用を明示的にモデル化すると、より堅牢で競争力のあるモデルにつながる可能性があることが示されています。この論文では、畳み込み特徴に自己注意機能を追加することにより、3Dオブジェクト検出におけるコンテキストモデリングの自己注意の2つのバリエーションを検討します。まず、ペアワイズ自己注意メカニズムを現在の最先端のBEV、ボクセル、およびポイントベースの検出器に組み込み、強力なベースラインモデルに対して一貫した改善を示すと同時に、パラメーターのフットプリントと計算コストを大幅に削減します。また、ランダムにサンプリングされた場所で変形を学習することにより、最も代表的な特徴のサブセットをサンプリングする自己注意バリアントを提案します。これにより、明示的なグローバルコンテキストモデリングをより大きな点群にスケーリングできるだけでなく、より識別力があり有益な機能記述子が得られます。私たちの方法は、精度とパラメータおよび計算効率を向上させて、ほとんどの最先端の検出器に柔軟に適用できます。 KITTIおよびnuScenesデータセットで新しい最先端の検出パフォーマンスを実現します。コードはhttps://github.com/AutoVision-cloud/SA-Det3Dで入手できます。

Most existing point-cloud based 3D object detectors use convolution-like operators to process information in a local neighbourhood with fixed-weight kernels and aggregate global context hierarchically. However, recent work on non-local neural networks and self-attention for 2D vision has shown that explicitly modeling global context and long-range interactions between positions can lead to more robust and competitive models. In this paper, we explore two variants of self-attention for contextual modeling in 3D object detection by augmenting convolutional features with self-attention features. We first incorporate the pairwise self-attention mechanism into the current state-of-the-art BEV, voxel and point-based detectors and show consistent improvement over strong baseline models while simultaneously significantly reducing their parameter footprint and computational cost. We also propose a self-attention variant that samples a subset of the most representative features by learning deformations over randomly sampled locations. This not only allows us to scale explicit global contextual modeling to larger point-clouds, but also leads to more discriminative and informative feature descriptors. Our method can be flexibly applied to most state-of-the-art detectors with increased accuracy and parameter and compute efficiency. We achieve new state-of-the-art detection performance on KITTI and nuScenes datasets. Code is available at https://github.com/AutoVision-cloud/SA-Det3D.

updated: Thu Jan 07 2021 18:30:32 GMT+0000 (UTC)

published: Thu Jan 07 2021 18:30:32 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト