3D-FFS: Faster 3D object detection with Focused Frustum Search in sensor fusion based networks

Aniruddha Ganguly; Tasin Ishmam; Khandker Aftarul Islam; Md Zahidur Rahman; Md. Shamsuzzoha Bayzid

3D-FFS：センサーフュージョンベースのネットワークでの集中錐台検索による高速3Dオブジェクト検出

この作業では、3D-FFSを提案します。これは、計算コストの低いヒューリスティックのクラスを使用して、センサーフュージョンベースの3Dオブジェクト検出ネットワークを大幅に高速化する新しいアプローチです。既存のセンサーフュージョンベースのネットワークは、2Dオブジェクト検出器からの推論を活用して3D領域の提案を生成します。ただし、画像には深度情報がないため、これらのネットワークは、シーン全体からポイントのセマンティック特徴を抽出してオブジェクトを特定することに依存しています。 3Dポイントクラウドデータの集約された固有のプロパティ（ポイント密度など）を活用することにより、3D-FFSは3D検索スペースを大幅に制限し、精度を犠牲にすることなくトレーニング時間、推論時間、メモリ消費を大幅に削減できます。 3D-FFSの有効性を実証するために、3Dオブジェクト検出モデルに基づく著名なセンサーフュージョンであるFrustum ConvNet（F-ConvNet）と統合しました。 KITTIデータセットでの3D-FFSのパフォーマンスを評価します。 F-ConvNetと比較して、トレーニング時間と推論時間の改善をそれぞれ最大62.84％と56.46％達成し、メモリ使用量を最大58.53％削減します。さらに、自動車、歩行者、サイクリストの各クラスの精度がそれぞれ0.59％、2.03％、3.34％向上しました。 3D-FFSは、自動運転車、ドローン、ロボット工学など、計算能力が限られているドメインで多くの可能性を示しています。LiDAR-Cameraベースのセンサーフュージョン知覚システムが広く使用されています。

In this work we propose 3D-FFS, a novel approach to make sensor fusion based 3D object detection networks significantly faster using a class of computationally inexpensive heuristics. Existing sensor fusion based networks generate 3D region proposals by leveraging inferences from 2D object detectors. However, as images have no depth information, these networks rely on extracting semantic features of points from the entire scene to locate the object. By leveraging aggregated intrinsic properties (e.g. point density) of the 3D point cloud data, 3D-FFS can substantially constrain the 3D search space and thereby significantly reduce training time, inference time and memory consumption without sacrificing accuracy. To demonstrate the efficacy of 3D-FFS, we have integrated it with Frustum ConvNet (F-ConvNet), a prominent sensor fusion based 3D object detection model. We assess the performance of 3D-FFS on the KITTI dataset. Compared to F-ConvNet, we achieve improvements in training and inference times by up to 62.84% and 56.46%, respectively, while reducing the memory usage by up to 58.53%. Additionally, we achieve 0.59%, 2.03% and 3.34% improvements in accuracy for the Car, Pedestrian and Cyclist classes, respectively. 3D-FFS shows a lot of promise in domains with limited computing power, such as autonomous vehicles, drones and robotics where LiDAR-Camera based sensor fusion perception systems are widely used.

updated: Mon Mar 15 2021 11:32:21 GMT+0000 (UTC)

published: Mon Mar 15 2021 11:32:21 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト