TiG-BEV: Multi-view BEV 3D Object Detection via Target Inner-Geometry Learning

Peixiang Huang; Li Liu; Renrui Zhang; Song Zhang; Xinli Xu; Baichao Wang; Guoyi Liu

TiG-BEV: ターゲット内部ジオメトリ学習によるマルチビュー BEV 3D オブジェクト検出

正確で低コストの 3D オブジェクト検出を実現するために、既存の方法では、LiDAR モダリティによって提供される空間キューを備えたカメラベースのマルチビュー検出器 (高密度深度監視や鳥瞰図 (BEV) 機能抽出など) を利用することが提案されています。ただし、LiDAR からカメラへのポイントツーポイントの模倣を直接行うため、フォアグラウンドターゲットの内部ジオメトリが無視され、2D と 3D の機能間のモーダルギャップに悩まされます。この論文では、TiG-BEV と呼ばれる、密な深度と BEV 機能の両方について、LiDAR モダリティからカメラベースの BEV 検出器へのターゲット内部ジオメトリの学習スキームを提案します。まず、異なる前景ピクセル間の低レベルの相対的な深度関係を学習するために、内部深度監視モジュールを導入します。これにより、カメラベースの検出器がオブジェクトごとの空間構造をよりよく理解できるようになります。次に、前景ターゲット内のさまざまなキーポイントの高レベルのセマンティクスを模倣するために、内部機能 BEV 蒸留モジュールを設計します。 2つのモダリティ間のBEV機能のギャップをさらに軽減するために、機能類似性モデリングにチャネル間およびキーポイント間蒸留の両方を採用しています。私たちのターゲット内部ジオメトリ蒸留により、TiG-BEV は、BEVDepth を +2.3% NDS および +2.4% mAP だけ効果的にブーストし、BEVDet とともに、nuScenes val セットで +9.1% NDS および +10.3% mAP をブーストできます。コードは https://github.com/ADLab3Ds/TiG-BEV で入手できます。

To achieve accurate and low-cost 3D object detection, existing methods propose to benefit camera-based multi-view detectors with spatial cues provided by the LiDAR modality, e.g., dense depth supervision and bird-eye-view (BEV) feature distillation. However, they directly conduct point-to-point mimicking from LiDAR to camera, which neglects the inner-geometry of foreground targets and suffers from the modal gap between 2D-3D features. In this paper, we propose the learning scheme of Target Inner-Geometry from the LiDAR modality into camera-based BEV detectors for both dense depth and BEV features, termed as TiG-BEV. First, we introduce an inner-depth supervision module to learn the low-level relative depth relations between different foreground pixels. This enables the camera-based detector to better understand the object-wise spatial structures. Second, we design an inner-feature BEV distillation module to imitate the high-level semantics of different keypoints within foreground targets. To further alleviate the BEV feature gap between two modalities, we adopt both inter-channel and inter-keypoint distillation for feature-similarity modeling. With our target inner-geometry distillation, TiG-BEV can effectively boost BEVDepth by +2.3% NDS and +2.4% mAP, along with BEVDet by +9.1% NDS and +10.3% mAP on nuScenes val set. Code will be available at https://github.com/ADLab3Ds/TiG-BEV.

updated: Wed Dec 28 2022 17:53:43 GMT+0000 (UTC)

published: Wed Dec 28 2022 17:53:43 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト