FusionPainting: Multimodal Fusion with Adaptive Attention for 3D Object Detection

Shaoqing Xu; Dingfu Zhou; Jin Fang; Junbo Yin; Zhou Bin; Liangjun Zhang

FusionPainting：3Dオブジェクト検出のための適応型アテンションを備えたマルチモーダルフュージョン

3Dで障害物を正確に検出することは、自動運転と高度道路交通システムにとって不可欠なタスクです。この作業では、3Dオブジェクト検出タスクをブーストするためのセマンティックレベルで2DRGB画像と3Dポイントクラウドを融合する一般的なマルチモーダル融合フレームワークFusionPaintingを提案します。特に、FusionPaintingフレームワークは、マルチモーダルセマンティックセグメンテーションモジュール、アダプティブアテンションベースのセマンティックフュージョンモジュール、および3Dオブジェクト検出器の3つの主要モジュールで構成されています。まず、2Dおよび3Dセグメンテーションアプローチに基づいて、2D画像および3DLidarポイントクラウドのセマンティック情報が取得されます。次に、提案された注意ベースのセマンティック融合モジュールに基づいて、さまざまなセンサーからのセグメンテーション結果が適応的に融合されます。最後に、融合されたセマンティックラベルでペイントされた点群は、3D異議の結果を取得するために3D検出器に送信されます。提案されたフレームワークの有効性は、3つの異なるベースラインと比較することにより、大規模なnuScenes検出ベンチマークで検証されています。実験結果は、融合戦略が、点群のみを使用する方法、および2Dセグメンテーション情報のみでペイントされた点群を使用する方法と比較して、検出性能を大幅に改善できることを示しています。さらに、提案されたアプローチは、nuScenesテストベンチマークで他の最先端の方法よりも優れています。

Accurate detection of obstacles in 3D is an essential task for autonomous driving and intelligent transportation. In this work, we propose a general multimodal fusion framework FusionPainting to fuse the 2D RGB image and 3D point clouds at a semantic level for boosting the 3D object detection task. Especially, the FusionPainting framework consists of three main modules: a multi-modal semantic segmentation module, an adaptive attention-based semantic fusion module, and a 3D object detector. First, semantic information is obtained for 2D images and 3D Lidar point clouds based on 2D and 3D segmentation approaches. Then the segmentation results from different sensors are adaptively fused based on the proposed attention-based semantic fusion module. Finally, the point clouds painted with the fused semantic label are sent to the 3D detector for obtaining the 3D objection results. The effectiveness of the proposed framework has been verified on the large-scale nuScenes detection benchmark by comparing it with three different baselines. The experimental results show that the fusion strategy can significantly improve the detection performance compared to the methods using only point clouds, and the methods using point clouds only painted with 2D segmentation information. Furthermore, the proposed approach outperforms other state-of-the-art methods on the nuScenes testing benchmark.

updated: Tue Aug 10 2021 02:51:05 GMT+0000 (UTC)

published: Wed Jun 23 2021 14:53:22 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト