SA-Det3D: Self-Attention Based Context-Aware 3D Object Detection

Prarthana Bhattacharyya; Chengjie Huang; Krzysztof Czarnecki

SA-Det3D：自己注意ベースのコンテキストアウェア3Dオブジェクト検出

既存の点群ベースの3Dオブジェクト検出器は、畳み込みのような演算子を使用して、固定重みカーネルを使用してローカルネイバーフッド内の情報を処理し、グローバルコンテキストを階層的に集約します。ただし、非ローカルニューラルネットワークと2Dビジョンの自己注意は、長距離の相互作用を明示的にモデル化すると、より堅牢で競争力のあるモデルにつながる可能性があることを示しています。この論文では、畳み込み特徴を自己注意機能で拡張することにより、3Dオブジェクト検出におけるコンテキストモデリングのための自己注意の2つのバリエーションを提案します。まず、ペアワイズ自己注意メカニズムを現在の最先端のBEV、ボクセル、およびポイントベースの検出器に組み込み、最大1.5 3D APの強力なベースラインモデルに対して一貫した改善を示し、同時にパラメーターのフットプリントと計算コストを削減します。 KITTI検証セットでそれぞれ15〜80％および30〜50％。次に、ランダムにサンプリングされた場所の変形を学習することにより、最も代表的な特徴のサブセットをサンプリングする自己注意バリアントを提案します。これにより、明示的なグローバルコンテキストモデリングをより大きな点群にスケーリングできるだけでなく、より識別力があり有益な機能記述子が得られます。私たちの方法は、精度とパラメータおよび計算効率を向上させて、ほとんどの最先端の検出器に柔軟に適用できます。提案された方法が、KITTI、nuScenes、およびWaymoOpenデータセットでの3Dオブジェクト検出パフォーマンスを向上させることを示します。コードはhttps://github.com/AutoVision-cloud/SA-Det3Dで入手できます。

Existing point-cloud based 3D object detectors use convolution-like operators to process information in a local neighbourhood with fixed-weight kernels and aggregate global context hierarchically. However, non-local neural networks and self-attention for 2D vision have shown that explicitly modeling long-range interactions can lead to more robust and competitive models. In this paper, we propose two variants of self-attention for contextual modeling in 3D object detection by augmenting convolutional features with self-attention features. We first incorporate the pairwise self-attention mechanism into the current state-of-the-art BEV, voxel and point-based detectors and show consistent improvement over strong baseline models of up to 1.5 3D AP while simultaneously reducing their parameter footprint and computational cost by 15-80% and 30-50%, respectively, on the KITTI validation set. We next propose a self-attention variant that samples a subset of the most representative features by learning deformations over randomly sampled locations. This not only allows us to scale explicit global contextual modeling to larger point-clouds, but also leads to more discriminative and informative feature descriptors. Our method can be flexibly applied to most state-of-the-art detectors with increased accuracy and parameter and compute efficiency. We show our proposed method improves 3D object detection performance on KITTI, nuScenes and Waymo Open datasets. Code is available at https://github.com/AutoVision-cloud/SA-Det3D.

updated: Tue Mar 16 2021 16:53:42 GMT+0000 (UTC)

published: Thu Jan 07 2021 18:30:32 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト