ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection

Danila Rukhovich; Anna Vorontsova; Anton Konushin

ImVoxelNet: 単眼およびマルチビューの汎用 3D オブジェクト検出のための画像からボクセルへの投影

この論文では、エンドツーエンドの最適化問題として、マルチビュー RGB ベースの 3D オブジェクト検出のタスクを紹介します。この問題に対処するために、私たちは ImVoxelNet を提案します。これは、単眼または多視点 RGB 画像に基づく 3D オブジェクト検出の新しい完全畳み込み手法です。各マルチビュー入力の単眼画像の数は、トレーニングと推論中に変化する可能性があります。実際、この数はマルチビュー入力ごとに一意である可能性があります。 ImVoxelNet は、屋内と屋外の両方のシーンをうまく処理できるため、汎用性が高くなります。具体的には、RGB 画像を受け入れるすべての方法の中で、KITTI (単眼) および nuScenes (マルチビュー) ベンチマークでの自動車検出で最先端の結果を達成しています。さらに、SUN RGB-D データセットでの既存の RGB ベースの 3D オブジェクト検出方法を凌駕します。 ScanNet では、IMVoxelNet がマルチビュー 3D オブジェクト検出の新しいベンチマークを設定します。ソースコードとトレーニング済みモデルは、https://github.com/saic-vul/imvoxelnet で入手できます。

In this paper, we introduce the task of multi-view RGB-based 3D object detection as an end-to-end optimization problem. To address this problem, we propose ImVoxelNet, a novel fully convolutional method of 3D object detection based on monocular or multi-view RGB images. The number of monocular images in each multi-view input can variate during training and inference; actually, this number might be unique for each multi-view input. ImVoxelNet successfully handles both indoor and outdoor scenes, which makes it general-purpose. Specifically, it achieves state-of-the-art results in car detection on KITTI (monocular) and nuScenes (multi-view) benchmarks among all methods that accept RGB images. Moreover, it surpasses existing RGB-based 3D object detection methods on the SUN RGB-D dataset. On ScanNet, ImVoxelNet sets a new benchmark for multi-view 3D object detection. The source code and the trained models are available at https://github.com/saic-vul/imvoxelnet.

updated: Fri Oct 15 2021 12:54:42 GMT+0000 (UTC)

published: Wed Jun 02 2021 14:20:24 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト