A Generalized Multi-Modal Fusion Detection Framework

Leichao Cui; Xiuxian Li; Min Meng; Xiaoyu Mo

一般化されたマルチモーダルフュージョン検出フレームワーク

LiDAR ポイントクラウドは、自動運転における最も一般的なデータソースになりました。ただし、点群がまばらであるため、特定のシナリオでは正確で信頼性の高い検出を実現できません。点群との補完性により、画像はますます注目を集めています。ある程度の成功はありますが、既存の融合方法はハード融合を実行するか、直接融合しません。この論文では、マルチモーダル機能を使用して、MMFusion と呼ばれる一般的な 3D 検出フレームワークを提案します。このフレームワークは、LiDAR と画像の正確な融合を実現し、複雑なシーンでの 3D 検出を改善することを目的としています。私たちのフレームワークは、LiDAR ストリームとカメラストリームの 2 つの別個のストリームで構成されており、単一モードの特徴抽出ネットワークと互換性があります。 LiDAR ストリームのボクセルローカル認識モジュールは、ローカルフィーチャの表現を強化し、マルチモーダルフィーチャフュージョンモジュールは、異なるストリームからのフィーチャ出力を選択的に組み合わせて、より優れたフュージョンを実現します。広範な実験により、当社のフレームワークは既存のベンチマークよりも優れているだけでなく、特に KITTI ベンチマークでのサイクリストと歩行者の検出において、強力なロバスト性と一般化機能を備えた検出を改善することが示されています。私たちの研究が、自動運転タスクのためのマルチモーダルフュージョンに関する研究をさらに刺激することを願っています。

LiDAR point clouds have become the most common data source in autonomous driving. However, due to the sparsity of point clouds, accurate and reliable detection cannot be achieved in specific scenarios. Because of their complementarity with point clouds, images are getting increasing attention. Although with some success, existing fusion methods either perform hard fusion or do not fuse in a direct manner. In this paper, we propose a generic 3D detection framework called MMFusion, using multi-modal features. The framework aims to achieve accurate fusion between LiDAR and images to improve 3D detection in complex scenes. Our framework consists of two separate streams: the LiDAR stream and the camera stream, which can be compatible with any single-modal feature extraction network. The Voxel Local Perception Module in the LiDAR stream enhances local feature representation, and then the Multi-modal Feature Fusion Module selectively combines feature output from different streams to achieve better fusion. Extensive experiments have shown that our framework not only outperforms existing benchmarks but also improves their detection, especially for detecting cyclists and pedestrians on KITTI benchmarks, with strong robustness and generalization capabilities. Hopefully, our work will stimulate more research into multi-modal fusion for autonomous driving tasks.

updated: Mon Mar 13 2023 12:38:07 GMT+0000 (UTC)

published: Mon Mar 13 2023 12:38:07 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト