Language-Guided 3D Object Detection in Point Cloud for Autonomous Driving

Wenhao Cheng; Junbo Yin; Wei Li; Ruigang Yang; Jianbing Shen

自動運転のための点群における言語ガイド付き 3D オブジェクト検出

この論文では、自動運転シナリオにおける 3D 参照表現理解 (REC) の問題に対処します。これは、LiDAR 点群のターゲット領域に自然言語を定着させることを目的としています。 REC に対するこれまでのアプローチは通常、2D または 3D の屋内ドメインに焦点を当てており、自動運転シーンでクエリされた 3D 領域の位置を正確に予測するのには適していません。さらに、上限の制限と膨大な計算コストが、より良い解決策を探求する動機となります。この研究では、LiDAR Grounding と呼ばれる、新しいマルチモーダル視覚グラウンディングタスクを提案します。次に、効果的なトークン融合戦略を備えたマルチモーダルシングルショットグラウンディング (MSSG) アプローチを考案します。 LiDAR ベースの物体検出器と言語機能を共同学習し、後処理を行わずに検出器から直接ターゲット領域を予測します。さらに、画像の特徴を当社のアプローチに柔軟に統合して、豊富なテクスチャと色の情報を提供できます。クロスモーダル学習により、検出器は有益な言語表現を考慮して点群内の重要な領域に集中することができるため、精度と効率が大幅に向上します。 Talk2Car データセットに関する広範な実験により、提案された方法の有効性が実証されています。私たちの研究は、LiDAR ベースの接地タスクについてのより深い洞察を提供し、自動運転コミュニティに有望な方向性を提示するものと期待しています。

This paper addresses the problem of 3D referring expression comprehension (REC) in autonomous driving scenario, which aims to ground a natural language to the targeted region in LiDAR point clouds. Previous approaches for REC usually focus on the 2D or 3D-indoor domain, which is not suitable for accurately predicting the location of the queried 3D region in an autonomous driving scene. In addition, the upper-bound limitation and the heavy computation cost motivate us to explore a better solution. In this work, we propose a new multi-modal visual grounding task, termed LiDAR Grounding. Then we devise a Multi-modal Single Shot Grounding (MSSG) approach with an effective token fusion strategy. It jointly learns the LiDAR-based object detector with the language features and predicts the targeted region directly from the detector without any post-processing. Moreover, the image feature can be flexibly integrated into our approach to provide rich texture and color information. The cross-modal learning enforces the detector to concentrate on important regions in the point cloud by considering the informative language expressions, thus leading to much better accuracy and efficiency. Extensive experiments on the Talk2Car dataset demonstrate the effectiveness of the proposed methods. Our work offers a deeper insight into the LiDAR-based grounding task and we expect it presents a promising direction for the autonomous driving community.

updated: Thu May 25 2023 06:22:10 GMT+0000 (UTC)

published: Thu May 25 2023 06:22:10 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト