Cross Modal Transformer: Towards Fast and Robust 3D Object Detection

Junjie Yan; Yingfei Liu; Jianjian Sun; Fan Jia; Shuailin Li; Tiancai Wang; Xiangyu Zhang

Cross Modal Transformer: 高速でロバストな 3D オブジェクト検出に向けて

この論文では、エンドツーエンドの 3D マルチモーダル検出のために、Cross Modal Transformer (CMT) という名前の堅牢な 3D 検出器を提案します。明示的なビュー変換がなければ、CMT は画像と点群のトークンを入力として受け取り、正確な 3D バウンディングボックスを直接出力します。マルチモーダルトークンの空間的位置合わせは、3D ポイントをマルチモーダルフィーチャにエンコードすることによって実行されます。 CMT のコアデザインは非常にシンプルですが、そのパフォーマンスは印象的です。より高速な推論速度を維持しながら、nuScenes テストセットで 74.1% の NDS (単一モデルの最先端) を達成します。さらに、CMT は、LiDAR がなくなった場合でも強力なロバスト性を備えています。コードは https://github.com/junjie18/CMT で公開されています。

In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. It achieves 74.1% NDS (state-of-the-art with single model) on nuScenes test set while maintaining faster inference speed. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code is released at https://github.com/junjie18/CMT.

updated: Sun Mar 12 2023 07:56:36 GMT+0000 (UTC)

published: Tue Jan 03 2023 18:36:52 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト