Lidar Point Cloud Guided Monocular 3D Object Detection

Liang Peng; Fei Liu; Zhengxu Yu; Senbo Yan; Dan Deng; Zheng Yang; Haifeng Liu; Deng Cai

Lidarポイントクラウドガイド付き単眼3Dオブジェクト検出

単眼の3Dオブジェクト検出は、自動運転およびコンピュータービジョンのコミュニティでは困難な作業です。一般的な方法として、これまでのほとんどの作業では、手動で注釈を付けた3Dボックスラベルを使用していますが、注釈付けプロセスにはコストがかかります。この論文では、正確かつ注意深く注釈が付けられたラベルは、単眼3D検出では不要である可能性があることを発見しました。これは、興味深く、直感に反する発見です。ランダムに乱された粗いラベルを使用すると、検出器は、グラウンドトゥルースラベルを使用したものと比較して非常に近い精度を達成できます。この基礎となるメカニズムを詳しく調べてから、経験的に次のことがわかりました。ラベルの精度に関しては、ラベルの他の部分と比較して、ラベルの3D位置部分が優先されます。上記の結論に動機付けられ、正確なLiDAR 3D測定を考慮して、LiDARポイントクラウドガイド付き単眼3Dオブジェクト検出（LPCG）と呼ばれるシンプルで効果的なフレームワークを提案します。このフレームワークは、追加の注釈コストを導入することなく、注釈コストを削減するか、検出精度を大幅に向上させることができます。具体的には、ラベルのないLiDARポイントクラウドから疑似ラベルを生成します。 3D空間での正確なLiDAR3D測定のおかげで、そのような疑似ラベルは、3D位置情報が正確であるため、単眼3D検出器のトレーニングで手動で注釈が付けられたラベルを置き換えることができます。 LPCGは、自動運転システムでラベルのない大量のデータを完全に使用するために、任意の単眼3D検出器に適用できます。その結果、KITTIベンチマークでは、単眼3DとBEV（鳥瞰図）の両方の検出で、かなりのマージンを持って1位になりました。 Waymoベンチマークでは、10％のラベル付きデータを使用する方法により、100％のラベル付きデータを使用するベースライン検出器と同等の精度が達成されます。コードはhttps://github.com/SPengLiang/LPCGでリリースされています。

Monocular 3D object detection is a challenging task in the self-driving and computer vision community. As a common practice, most previous works use manually annotated 3D box labels, where the annotating process is expensive. In this paper, we find that the precisely and carefully annotated labels may be unnecessary in monocular 3D detection, which is an interesting and counterintuitive finding. Using rough labels that are randomly disturbed, the detector can achieve very close accuracy compared to the one using the ground-truth labels. We delve into this underlying mechanism and then empirically find that: concerning the label accuracy, the 3D location part in the label is preferred compared to other parts of labels. Motivated by the conclusions above and considering the precise LiDAR 3D measurement, we propose a simple and effective framework, dubbed LiDAR point cloud guided monocular 3D object detection (LPCG). This framework is capable of either reducing the annotation costs or considerably boosting the detection accuracy without introducing extra annotation costs. Specifically, It generates pseudo labels from unlabeled LiDAR point clouds. Thanks to accurate LiDAR 3D measurements in 3D space, such pseudo labels can replace manually annotated labels in the training of monocular 3D detectors, since their 3D location information is precise. LPCG can be applied into any monocular 3D detector to fully use massive unlabeled data in a self-driving system. As a result, in KITTI benchmark, we take the first place on both monocular 3D and BEV (bird's-eye-view) detection with a significant margin. In Waymo benchmark, our method using 10% labeled data achieves comparable accuracy to the baseline detector using 100% labeled data. The codes are released at https://github.com/SPengLiang/LPCG.

updated: Mon Jul 25 2022 03:11:46 GMT+0000 (UTC)

published: Mon Apr 19 2021 03:41:09 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト