RAANet: Range-Aware Attention Network for LiDAR-based 3D Object Detection with Auxiliary Density Level Estimation

Yantao Lu; Xuetao Hao; Shiqi Sun; Weiheng Chai; Yu Ding; Muchenxuan Tong; Senem Velipasalar

RAANet：補助密度レベル推定を使用したLiDARベースの3Dオブジェクト検出のための範囲認識アテンションネットワーク

自動運転のためのLiDARデータからの3Dオブジェクト検出は、近年目覚ましい進歩を遂げています。最先端の方法論の中で、点群を鳥瞰図（BEV）にエンコードすることは、効果的かつ効率的であることが実証されています。透視図とは異なり、BEVはオブジェクト間の豊富な空間情報と距離情報を保持します。同じタイプの遠いオブジェクトはBEVで小さく表示されませんが、それらにはよりまばらな点群の特徴が含まれています。この事実は、共有重み畳み込みニューラルネットワークを使用したBEV特徴抽出を弱めます。この課題に対処するために、より強力なBEV機能を抽出し、優れた3Dオブジェクト検出を生成するRange-Aware Attention Network（RAANet）を提案します。 Range-aware Attention（RAA）畳み込みは、近くのオブジェクトと遠くのオブジェクトの特徴抽出を大幅に改善します。さらに、オクルージョンされたオブジェクトのRAANetの検出精度をさらに高めるために、密度推定のための新しい補助損失を提案します。提案されているRAA畳み込みは軽量で互換性があり、BEV検出に使用されるCNNアーキテクチャに統合できることに注意してください。 nuScenesデータセットでの広範な実験は、提案されたアプローチがLiDARベースの3Dオブジェクト検出の最先端の方法よりも優れており、フルバージョンで16 Hz、ライトバージョンで22Hzのリアルタイム推論速度を示しています。このコードは、匿名のGithubリポジトリhttps://github.com/anonymous0522/RAANで公開されています。

3D object detection from LiDAR data for autonomous driving has been making remarkable strides in recent years. Among the state-of-the-art methodologies, encoding point clouds into a bird's-eye view (BEV) has been demonstrated to be both effective and efficient. Different from perspective views, BEV preserves rich spatial and distance information between objects; and while farther objects of the same type do not appear smaller in the BEV, they contain sparser point cloud features. This fact weakens BEV feature extraction using shared-weight convolutional neural networks. In order to address this challenge, we propose Range-Aware Attention Network (RAANet), which extracts more powerful BEV features and generates superior 3D object detections. The range-aware attention (RAA) convolutions significantly improve feature extraction for near as well as far objects. Moreover, we propose a novel auxiliary loss for density estimation to further enhance the detection accuracy of RAANet for occluded objects. It is worth to note that our proposed RAA convolution is lightweight and compatible to be integrated into any CNN architecture used for the BEV detection. Extensive experiments on the nuScenes dataset demonstrate that our proposed approach outperforms the state-of-the-art methods for LiDAR-based 3D object detection, with real-time inference speed of 16 Hz for the full version and 22 Hz for the lite version. The code is publicly available at an anonymous Github repository https://github.com/anonymous0522/RAAN.

updated: Tue Mar 15 2022 16:49:31 GMT+0000 (UTC)

published: Thu Nov 18 2021 04:20:13 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト