Multi-View Adaptive Fusion Network for 3D Object Detection

Guojun Wang; Bin Tian; Yachen Zhang; Long Chen; Dongpu Cao; Jian Wu

3Dオブジェクト検出のためのマルチビュー適応融合ネットワーク

LiDARとカメラの融合に基づく3Dオブジェクト検出は、自動運転の新たな研究テーマになりつつあります。しかし、情報の損失や干渉なしに両方のモダリティを効果的に融合することは驚くほど困難でした。この問題を解決するために、LiDAR鳥瞰図、LiDAR範囲ビュー、およびカメラビュー画像を3Dオブジェクト検出の入力として使用するシングルステージマルチビューフュージョンフレームワークを提案します。マルチビュー機能を効果的に融合するために、注意深いポイントワイズ融合（APF）モジュールを提案し、マルチビュー機能の適応融合をポイントワイズで実現できる注意メカニズムを備えた3つのソースの重要性を推定します。さらに、注意深いポイントごとの重み付け（APW）モジュールは、ネットワークが構造情報とポイント特徴の重要性を2つの追加タスク、つまり前景分類と中心回帰で学習できるように設計されており、予測された前景確率を使用してポイント特徴を再重み付けします。これら2つのコンポーネントを統合するために、MVAF-Netという名前のエンドツーエンドの学習可能なネットワークを設計します。 KITTI 3Dオブジェクト検出データセットで実施された評価は、提案されたAPFおよびAPWモジュールが大幅なパフォーマンスの向上を提供することを示しています。さらに、提案されたMVAF-Netは、すべての1段階融合方法の中で最高のパフォーマンスを達成し、ほとんどの2段階融合方法を上回り、KITTIベンチマークで速度と精度の間の最高のトレードオフを達成します。

3D object detection based on LiDAR-camera fusion is becoming an emerging research theme for autonomous driving. However, it has been surprisingly difficult to effectively fuse both modalities without information loss and interference. To solve this issue, we propose a single-stage multi-view fusion framework that takes LiDAR bird's-eye view, LiDAR range view and camera view images as inputs for 3D object detection. To effectively fuse multi-view features, we propose an attentive pointwise fusion (APF) module to estimate the importance of the three sources with attention mechanisms that can achieve adaptive fusion of multi-view features in a pointwise manner. Furthermore, an attentive pointwise weighting (APW) module is designed to help the network learn structure information and point feature importance with two extra tasks, namely, foreground classification and center regression, and the predicted foreground probability is used to reweight the point features. We design an end-to-end learnable network named MVAF-Net to integrate these two components. Our evaluations conducted on the KITTI 3D object detection datasets demonstrate that the proposed APF and APW modules offer significant performance gains. Moreover, the proposed MVAF-Net achieves the best performance among all single-stage fusion methods and outperforms most two-stage fusion methods, achieving the best trade-off between speed and accuracy on the KITTI benchmark.

updated: Tue Dec 08 2020 03:54:51 GMT+0000 (UTC)

published: Mon Nov 02 2020 00:06:01 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト