MonoSIM: Simulating Learning Behaviors of Heterogeneous Point Cloud Object Detectors for Monocular 3D Object Detection

Han Sun; Zhaoxin Fan; Zhenbo Song; Zhicheng Wang; Kejian Wu; Jianfeng Lu

MonoSIM: 単眼 3D オブジェクト検出のための異種点群オブジェクト検出器の学習動作のシミュレーション

単眼 3D オブジェクトの検出は、自動運転、ロボットによる把持、拡張現実などの多くのアプリケーションにとって基本的ですが、非常に重要なタスクです。既存の主要な方法は、最初に入力画像の深度を推定し、点群に基づいて 3D オブジェクトを検出する傾向があります。このルーチンは、深度推定とオブジェクト検出の間に固有のギャップがあります。また、予測誤差の蓄積もパフォーマンスに影響します。この論文では、MonoSIM という新しい方法が提案されています。 MonoSIM の導入の背後にある洞察は、トレーニング期間中に単眼検出器の点群ベースの検出器の機能学習動作をシミュレートすることを提案することです。したがって、推論期間中、学習された特徴と予測は、点群ベースの検出器に可能な限り似ています。それを達成するために、1 つのシーンレベルのシミュレーションモジュール、1 つの RoI レベルのシミュレーションモジュール、および 1 つの応答レベルのシミュレーションモジュールを提案します。これらは、検出器の全機能の学習および予測パイプラインに徐々に使用されます。この方法を有名な M3D-RPN 検出器と CaDDN 検出器に適用し、KITTI と Waymo Open データセットで広範な実験を行います。結果は、私たちの方法が、ネットワークアーキテクチャを変更することなく、さまざまな単眼検出器のパフォーマンスを大きなマージンで一貫して改善することを示しています。私たちのコードは、https://github.com/sunh18/MonoSIM}{https://github.com/sunh18/MonoSIM で公開されます。

Monocular 3D object detection is a fundamental but very important task to many applications including autonomous driving, robotic grasping and augmented reality. Existing leading methods tend to estimate the depth of the input image first, and detect the 3D object based on point cloud. This routine suffers from the inherent gap between depth estimation and object detection. Besides, the prediction error accumulation would also affect the performance. In this paper, a novel method named MonoSIM is proposed. The insight behind introducing MonoSIM is that we propose to simulate the feature learning behaviors of a point cloud based detector for monocular detector during the training period. Hence, during inference period, the learned features and prediction would be similar to the point cloud based detector as possible. To achieve it, we propose one scene-level simulation module, one RoI-level simulation module and one response-level simulation module, which are progressively used for the detector's full feature learning and prediction pipeline. We apply our method to the famous M3D-RPN detector and CaDDN detector, conducting extensive experiments on KITTI and Waymo Open datasets. Results show that our method consistently improves the performance of different monocular detectors for a large margin without changing their network architectures. Our codes will be publicly available at https://github.com/sunh18/MonoSIM}{https://github.com/sunh18/MonoSIM.

updated: Mon Dec 05 2022 16:12:54 GMT+0000 (UTC)

published: Fri Aug 19 2022 16:57:11 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト