Lidar Point Cloud Guided Monocular 3D Object Detection

Liang Peng; Fei Liu; Zhengxu Yu; Senbo Yan; Dan Deng; Zheng Yang; Haifeng Liu; Deng Cai

Lidarポイントクラウドガイド付き単眼3Dオブジェクト検出

単眼3D検出は、現在、LiDARベースの方法と比較して非常に低い検出率で苦労しています。精度の低さは、主に、単眼画像の不適切な性質のために正確な位置の手がかりがないことが原因です。正確な空間測定を提供するLiDARポイントクラウドは、単眼法のトレーニングに有益な情報を提供できます。 LiDARポイントクラウドを利用するために、以前の作業ではそれらを投影して深度マップラベルを形成し、その後、高密度深度推定器をトレーニングして明示的な位置の特徴を抽出しました。この間接的で複雑な方法では、中間製品、つまり深度マップの予測が導入され、多くの計算コストがかかるだけでなく、パフォーマンスが最適化されません。この論文では、LPCG（LiDAR点群誘導単眼3Dオブジェクト検出）を提案します。これは、LiDAR点群を使用した単眼3D検出器のトレーニングをガイドするための一般的なフレームワークです。具体的には、LiDARポイントクラウドを使用して疑似ラベルを生成し、単眼3D検出器が簡単に収集できる大量のラベルなしデータの恩恵を受けることができるようにします。 LPCGは、監視ありと監視なしの両方の設定で適切に機能します。一般的な設計のおかげで、LPCGは任意の単眼3D検出器に接続でき、パフォーマンスが大幅に向上します。その結果、かなりのマージンを持って、KITTI単眼3D / BEV（鳥瞰図）検出ベンチマークで1位になりました。コードはまもなく公開されます。

Monocular 3D detection currently struggles with extremely lower detection rates compared to LiDAR-based methods. The poor accuracy is mainly caused by the absence of accurate location cues due to the ill-posed nature of monocular imagery. LiDAR point clouds, which provide precise spatial measurement, can offer beneficial information for the training of monocular methods. To make use of LiDAR point clouds, prior works project them to form depth map labels, subsequently training a dense depth estimator to extract explicit location features. This indirect and complicated way introduces intermediate products, i.e., depth map predictions, taking much computation costs as well as leading to suboptimal performances. In this paper, we propose LPCG (LiDAR point cloud guided monocular 3D object detection), which is a general framework for guiding the training of monocular 3D detectors with LiDAR point clouds. Specifically, we use LiDAR point clouds to generate pseudo labels, allowing monocular 3D detectors to benefit from easy-collected massive unlabeled data. LPCG works well under both supervised and unsupervised setups. Thanks to a general design, LPCG can be plugged into any monocular 3D detector, significantly boosting the performance. As a result, we take the first place on KITTI monocular 3D/BEV (bird's-eye-view) detection benchmark with a considerable margin. The code will be made publicly available soon.

updated: Wed Sep 08 2021 12:07:50 GMT+0000 (UTC)

published: Mon Apr 19 2021 03:41:09 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト