SGM3D: Stereo Guided Monocular 3D Object Detection

Zheyuan Zhou; Liang Du; Xiaoqing Ye; Zhikang Zou; Xiao Tan; Li Zhang; Xiangyang Xue; Jianfeng Feng

SGM3D：ステレオガイド付き単眼3Dオブジェクト検出

単眼3Dオブジェクト検出は、単眼画像のみが与えられたオブジェクトカテゴリに沿って、3D空間でのオブジェクトの位置、寸法、および方向を予測することを目的としています。 2D画像平面の深度情報が非常に不足しているという不適切な特性のため、大きな課題があります。この問題を軽減するために、既製の深度推定を活用したり、LiDARセンサーに依存したりするアプローチは存在しますが、追加の深度モデルや高価な機器への依存は、一般的な3D知覚へのスケーラビリティを大幅に制限します。この論文では、ステレオ入力から学習した堅牢な3D機能を適応させて単眼検出の機能を強化する、SGM3Dと呼ばれるステレオガイド付き単眼3Dオブジェクト検出フレームワークを提案します。単眼キューでのみ与えられるステレオ模倣機能を生成するネットワークの機能を活用するために、マルチグラニュラリティドメイン適応（MG-DA）メカニズムを革新的に提示します。粗いBEV機能レベルと細かいアンカーレベルのドメイン適応の両方が、単眼ドメインでのガイダンスに活用されます。さらに、オブジェクトレベルのドメイン適応のためのIoUマッチングベースのアライメント（IoU-MA）メソッドを導入します。 MG-DAを採用しながら、不一致を軽減するためのステレオ予測と単眼予測の間。広範な実験により、KITTIおよびLyftデータセットに関する最先端の結果が実証されています。

Monocular 3D object detection aims to predict the object location, dimension and orientation in 3D space alongside the object category given only a monocular image. It poses a great challenge due to its ill-posed property which is critically lack of depth information in the 2D image plane. While there exist approaches leveraging off-the-shelve depth estimation or relying on LiDAR sensors to mitigate this problem, the dependence on the additional depth model or expensive equipment severely limits their scalability to generic 3D perception. In this paper, we propose a stereo-guided monocular 3D object detection framework, dubbed SGM3D, adapting the robust 3D features learned from stereo inputs to enhance the feature for monocular detection. We innovatively present a multi-granularity domain adaptation (MG-DA) mechanism to exploit the network's ability to generate stereo-mimicking features given only on monocular cues. Coarse BEV feature-level, as well as the fine anchor-level domain adaptation, are both leveraged for guidance in the monocular domain.In addition, we introduce an IoU matching-based alignment (IoU-MA) method for object-level domain adaptation between the stereo and monocular predictions to alleviate the mismatches while adopting the MG-DA. Extensive experiments demonstrate state-of-the-art results on KITTI and Lyft datasets.

updated: Thu Feb 24 2022 16:43:36 GMT+0000 (UTC)

published: Fri Dec 03 2021 13:57:14 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト