TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers

Xuyang Bai; Zeyu Hu; Xinge Zhu; Qingqiu Huang; Yilun Chen; Hongbo Fu; Chiew-Lan Tai

TransFusion：トランスフォーマーを使用した3Dオブジェクト検出のための堅牢なLiDAR-Camera Fusion

LiDARとカメラは、自動運転での3Dオブジェクト検出のための2つの重要なセンサーです。この分野でのセンサーフュージョンの人気が高まっているにもかかわらず、劣った画像条件、たとえば、悪い照明やセンサーのミスアライメントに対する堅牢性は十分に検討されていません。既存の融合方法は、主にキャリブレーションマトリックスによって確立されたLiDARポイントと画像ピクセルのハードアソシエーションのために、このような条件の影響を受けやすくなります。 TransFusionを提案します。これは、劣った画像条件を処理するためのソフトアソシエーションメカニズムを備えたLiDAR-カメラ融合の堅牢なソリューションです。具体的には、TransFusionは、畳み込みバックボーンとトランスデコーダーに基づく検出ヘッドで構成されています。デコーダーの第1層は、オブジェクトクエリのスパースセットを使用してLiDARポイントクラウドから初期バウンディングボックスを予測し、その第2デコーダー層は、空間的関係とコンテキスト関係の両方を活用して、オブジェクトクエリを有用な画像機能と適応的に融合します。トランスフォーマーの注意メカニズムにより、モデルは画像からどこでどのような情報を取得するかを適応的に決定できるため、堅牢で効果的な融合戦略につながります。さらに、点群で検出するのが難しいオブジェクトを処理するための画像誘導クエリ初期化戦略を設計します。 TransFusionは、大規模なデータセットで最先端のパフォーマンスを実現します。縮退した画質とキャリブレーションエラーに対する堅牢性を実証するために、広範な実験を提供します。また、提案された方法を3D追跡タスクに拡張し、nuScenes追跡のリーダーボードで1位を獲得し、その有効性と一般化機能を示します。

LiDAR and camera are two important sensors for 3D object detection in autonomous driving. Despite the increasing popularity of sensor fusion in this field, the robustness against inferior image conditions, e.g., bad illumination and sensor misalignment, is under-explored. Existing fusion methods are easily affected by such conditions, mainly due to a hard association of LiDAR points and image pixels, established by calibration matrices. We propose TransFusion, a robust solution to LiDAR-camera fusion with a soft-association mechanism to handle inferior image conditions. Specifically, our TransFusion consists of convolutional backbones and a detection head based on a transformer decoder. The first layer of the decoder predicts initial bounding boxes from a LiDAR point cloud using a sparse set of object queries, and its second decoder layer adaptively fuses the object queries with useful image features, leveraging both spatial and contextual relationships. The attention mechanism of the transformer enables our model to adaptively determine where and what information should be taken from the image, leading to a robust and effective fusion strategy. We additionally design an image-guided query initialization strategy to deal with objects that are difficult to detect in point clouds. TransFusion achieves state-of-the-art performance on large-scale datasets. We provide extensive experiments to demonstrate its robustness against degenerated image quality and calibration errors. We also extend the proposed method to the 3D tracking task and achieve the 1st place in the leaderboard of nuScenes tracking, showing its effectiveness and generalization capability.

updated: Tue Mar 22 2022 07:15:13 GMT+0000 (UTC)

published: Tue Mar 22 2022 07:15:13 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト