Ground Plane Matters: Picking Up Ground Plane Prior in Monocular 3D Object Detection

Fan Yang; Xinhao Xu; Hui Chen; Yuchen Guo; Jungong Han; Kai Ni; Guiguang Ding

地平面の問題: 単眼 3D オブジェクト検出で前に地平面をピックアップ

事前の地平面は、単眼 3D オブジェクト検出 (M3OD) における非常に有益なジオメトリの手がかりです。ただし、ほとんどの主流の方法では無視されてきました。このホワイトペーパーでは、事前に地平面の適用性を制限する 2 つの重要な要因を特定します。それは、投影点の位置特定の問題と地平面の傾斜の問題です。 M3OD の前にグランドプレーンを選択するために、両方の問題を一度に解決する Ground Plane Enhanced Network (GPENet) を提案します。投影ポイントのローカリゼーションの問題については、3D バウンディングボックス (BBox) の下部頂点または下部中央を使用する代わりに、オブジェクトの接地点を利用します。これは、画像内の明示的なピクセルであり、ニューラルネットワークが検出しやすいものです。地面の傾きの問題については、当社の GPENet が画像の水平線を推定し、地面の方程式を正確に推定するための新しい数式を導き出します。水平線のオクルージョンに対処するために、教師なし垂直エッジマイニングアルゴリズムも提案されています。さらに、正確な接触点と地面の方程式を利用できる動的逆投影アルゴリズムに基づいて、新しい 3D バウンディングボックス推定法を設計します。さらに、M3OD ラベルのみを使用して、接点と水平線の疑似ラベルを簡単に生成でき、追加のデータ収集やラベルアノテーションのコストはかかりません。人気のある KITTI ベンチマークでの広範な実験は、当社の GPENet が他の方法よりも優れており、最先端のパフォーマンスを達成できることを示しており、提案されたアプローチの有効性と優位性を十分に実証しています。さらに、当社の GPENet は、nuScenes データセットでのクロスデータセット評価において他の方法よりもうまく機能します。私たちのコードとモデルが公開されます。

The ground plane prior is a very informative geometry clue in monocular 3D object detection (M3OD). However, it has been neglected by most mainstream methods. In this paper, we identify two key factors that limit the applicability of ground plane prior: the projection point localization issue and the ground plane tilt issue. To pick up the ground plane prior for M3OD, we propose a Ground Plane Enhanced Network (GPENet) which resolves both issues at one go. For the projection point localization issue, instead of using the bottom vertices or bottom center of the 3D bounding box (BBox), we leverage the object's ground contact points, which are explicit pixels in the image and easy for the neural network to detect. For the ground plane tilt problem, our GPENet estimates the horizon line in the image and derives a novel mathematical expression to accurately estimate the ground plane equation. An unsupervised vertical edge mining algorithm is also proposed to address the occlusion of the horizon line. Furthermore, we design a novel 3D bounding box deduction method based on a dynamic back projection algorithm, which could take advantage of the accurate contact points and the ground plane equation. Additionally, using only M3OD labels, contact point and horizon line pseudo labels can be easily generated with NO extra data collection and label annotation cost. Extensive experiments on the popular KITTI benchmark show that our GPENet can outperform other methods and achieve state-of-the-art performance, well demonstrating the effectiveness and the superiority of the proposed approach. Moreover, our GPENet works better than other methods in cross-dataset evaluation on the nuScenes dataset. Our code and models will be published.

updated: Thu Nov 03 2022 02:21:35 GMT+0000 (UTC)

published: Thu Nov 03 2022 02:21:35 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト