Enabling Efficient Deep Convolutional Neural Network-based Sensor Fusion for Autonomous Driving

Xiaoming Zeng; Zhendong Wang; Yang Hu

自動運転のための効率的なディープ畳み込みニューラルネットワークベースのセンサーフュージョンの実現

自動運転には、正確な認識と安全な意思決定が必要です。これを実現するために、自動運転車には複数のセンサー（カメラ、Lidarなど）が搭載され、さまざまなセンシングモダリティからのデータを融合することで補完的な環境コンテキストを活用できるようになりました。 Deep Convolutional Neural Network（DCNN）の成功により、DCNN間の融合は、満足のいく知覚精度を達成するための有望な戦略として証明されました。ただし、主流の既存のDCNN融合スキームは、さまざまな段階でさまざまなモダリティから抽出された特徴マップを要素ごとに直接追加することによって融合を実行し、融合される特徴が一致するかどうかを考慮しません。したがって、最初に、融合されている特徴マップ間の特徴の不一致の程度を定量的に測定するための特徴の不一致メトリックを提案します。次に、機能の不一致の問題に取り組むための機能の一致手法として、Fusion-filterを提案します。また、より少ない計算オーバーヘッドでより高い精度を達成できる、深層でのレイヤー共有手法を提案します。特徴の不一致が追加の損失となることと相まって、提案されたテクノロジーにより、DCNNは、異なるモダリティから同様の特性と補完的な視覚的コンテキストを持つ対応する特徴マップを学習して、より高い精度を実現できます。実験結果は、提案された融合手法が、より少ない計算リソース要求でKITTIデータセットの精度を向上させることができることを示しています。

Autonomous driving demands accurate perception and safe decision-making. To achieve this, automated vehicles are now equipped with multiple sensors (e.g., camera, Lidar, etc.), enabling them to exploit complementary environmental context by fusing data from different sensing modalities. With the success of Deep Convolutional Neural Network(DCNN), the fusion between DCNNs has been proved as a promising strategy to achieve satisfactory perception accuracy. However, mainstream existing DCNN fusion schemes conduct fusion by directly element-wisely adding feature maps extracted from different modalities together at various stages, failing to consider whether the features being fused are matched or not. Therefore, we first propose a feature disparity metric to quantitatively measure the degree of feature disparity between the feature maps being fused. We then propose Fusion-filter as a feature-matching techniques to tackle the feature-mismatching issue. We also propose a Layer-sharing technique in the deep layer that can achieve better accuracy with less computational overhead. Together with the help of the feature disparity to be an additional loss, our proposed technologies enable DCNN to learn corresponding feature maps with similar characteristics and complementary visual context from different modalities to achieve better accuracy. Experimental results demonstrate that our proposed fusion technique can achieve better accuracy on KITTI dataset with less computational resources demand.

updated: Tue Feb 22 2022 23:35:30 GMT+0000 (UTC)

published: Tue Feb 22 2022 23:35:30 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト