Probabilistic and Geometric Depth: Detecting Objects in Perspective

Tai Wang; Xinge Zhu; Jiangmiao Pang; Dahua Lin

確率的および幾何学的な深さ：遠近法によるオブジェクトの検出

3Dオブジェクト検出は、運転支援システムなどのさまざまな実用的なアプリケーションで必要とされる重要な機能です。単眼3D検出は、両眼視またはLiDARに依存する従来の設定と比較して経済的なソリューションとして、最近ますます注目を集めていますが、それでも不十分な結果が得られます。このホワイトペーパーでは、最初にこの問題に関する体系的な調査を行い、現在の単眼3D検出問題をインスタンス深度推定問題として簡略化できることを確認します。インスタンス深度が不正確であると、他のすべての3D属性予測が全体的な検出パフォーマンスを向上させるのを妨げます。ただし、最近の方法では、さまざまなオブジェクト間の幾何学的関係を無視して、孤立したインスタンスまたはピクセルに基づいて深度を直接推定します。これは、深度に関する重要な情報が単眼画像に直接現れないため、貴重な制約となる可能性があります。したがって、予測されたオブジェクト全体で幾何学的関係グラフを作成し、そのグラフを使用して深度推定を容易にします。この不適切な設定では、各インスタンスの予備的な深度推定は通常不正確であるため、不確実性をキャプチャするために確率的表現を組み込みます。これは、信頼できる予測を識別し、深度の伝播をさらにガイドするための重要な指標を提供します。基本的な考え方は単純ですが、私たちの方法はKITTIとnuScenesのベンチマークを大幅に改善し、リアルタイムの効率を維持しながら、すべての単眼視覚のみの方法の中で1位を達成しています。コードとモデルはhttps://github.com/open-mmlab/mmdetection3dでリリースされます。

3D object detection is an important capability needed in various practical applications such as driver assistance systems. Monocular 3D detection, as an economical solution compared to conventional settings relying on binocular vision or LiDAR, has drawn increasing attention recently but still yields unsatisfactory results. This paper first presents a systematic study on this problem and observes that the current monocular 3D detection problem can be simplified as an instance depth estimation problem: The inaccurate instance depth blocks all the other 3D attribute predictions from improving the overall detection performance. However, recent methods directly estimate the depth based on isolated instances or pixels while ignoring the geometric relations across different objects, which can be valuable constraints as the key information about depth is not directly manifest in the monocular image. Therefore, we construct geometric relation graphs across predicted objects and use the graph to facilitate depth estimation. As the preliminary depth estimation of each instance is usually inaccurate in this ill-posed setting, we incorporate a probabilistic representation to capture the uncertainty. It provides an important indicator to identify confident predictions and further guide the depth propagation. Despite the simplicity of the basic idea, our method obtains significant improvements on KITTI and nuScenes benchmarks, achieving the 1st place out of all monocular vision-only methods while still maintaining real-time efficiency. Code and models will be released at https://github.com/open-mmlab/mmdetection3d.

updated: Thu Jul 29 2021 16:30:33 GMT+0000 (UTC)

published: Thu Jul 29 2021 16:30:33 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト