PersDet: Monocular 3D Detection in Perspective Bird's-Eye-View

Hongyu Zhou; Zheng Ge; Weixin Mao; Zeming Li

PersDet: 透視鳥瞰図での単眼 3D 検出

現在、鳥瞰図 (BEV) での 3D オブジェクトの検出は、自動運転やロボット工学の他の 3D 検出器よりも優れています。ただし、画像の特徴を BEV に変換するには、特別なオペレーターが特徴のサンプリングを行う必要があります。これらのオペレーターは多くのエッジデバイスでサポートされていないため、検出器を展開する際に余分な障害が発生します。この問題に対処するために、BEV 表現の生成を再検討し、視点 BEV でオブジェクトを検出することを提案します。これは、特徴サンプリングを必要としない新しい BEV 表現です。パースペクティブ BEV 機能も同様に BEV パラダイムの利点を享受できることを示します。さらに、遠近法BEVは、特徴サンプリングによって引き起こされる問題に対処することにより、検出パフォーマンスを向上させます。この発見に基づいて、透視BEV空間での高性能オブジェクト検出のためのPersDetを提案します。シンプルでメモリ効率の高い構造を実装しながら、PersDet は nuScenes ベンチマークで既存の最先端の単眼メソッドよりも優れており、ResNet-50 をバックボーンとして使用すると、34.6% の mAP と 40.8% の NDS に達します。

Currently, detecting 3D objects in Bird's-Eye-View (BEV) is superior to other 3D detectors for autonomous driving and robotics. However, transforming image features into BEV necessitates special operators to conduct feature sampling. These operators are not supported on many edge devices, bringing extra obstacles when deploying detectors. To address this problem, we revisit the generation of BEV representation and propose detecting objects in perspective BEV -- a new BEV representation that does not require feature sampling. We demonstrate that perspective BEV features can likewise enjoy the benefits of the BEV paradigm. Moreover, the perspective BEV improves detection performance by addressing issues caused by feature sampling. We propose PersDet for high-performance object detection in perspective BEV space based on this discovery. While implementing a simple and memory-efficient structure, PersDet outperforms existing state-of-the-art monocular methods on the nuScenes benchmark, reaching 34.6% mAP and 40.8% NDS when using ResNet-50 as the backbone.

updated: Fri Aug 19 2022 15:19:20 GMT+0000 (UTC)

published: Fri Aug 19 2022 15:19:20 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト