Point-Voxel CNN for Efficient 3D Deep Learning

Zhijian Liu; Haotian Tang; Yujun Lin; Song Han

効率的な3D深層学習のためのPoint-Voxel CNN

効率的で高速な3D深層学習のためのPoint-Voxel CNN（PVCNN）を紹介します。前の作業では、ボクセルベースまたはポイントベースのNNモデルを使用して3Dデータを処理します。ただし、どちらのアプローチも計算効率が悪いです。ボクセルベースのモデルの計算コストとメモリフットプリントは、入力解像度に応じて3倍に増加し、解像度のスケールアップがメモリ禁止になります。ポイントベースのネットワークに関しては、実際の特徴抽出ではなく、メモリローカリティがかなり低いスパースデータの構造化に最大80％の時間が浪費されます。この論文では、メモリ消費を削減するために3D入力データをポイントで表すPVCNNを提案し、ボクセルで畳み込みを実行して不規則なスパースデータアクセスを削減し、局所性を改善します。 PVCNNモデルは、メモリと計算の両方で効率的です。セマンティックおよびパーツセグメンテーションデータセットで評価され、10倍のGPUメモリ削減により、ボクセルベースのベースラインよりもはるかに高い精度を実現します。また、平均7倍の速度向上を実現した最新のポイントベースモデルよりも優れています。驚くべきことに、PVCNNのより狭いバージョンは、はるかに高い精度で、パーツおよびシーンのセグメント化ベンチマークでPointNet（非常に効率的なモデル）の2倍の高速化を実現します。 3Dオブジェクト検出におけるPVCNNの一般的な有効性を検証します：Frustrum PointNetのプリミティブをPVConvに置き換えることにより、Frustrum PointNet ++を平均1.5％の速度で測定し、GPUメモリを1.5倍削減します。

We present Point-Voxel CNN (PVCNN) for efficient, fast 3D deep learning. Previous work processes 3D data using either voxel-based or point-based NN models. However, both approaches are computationally inefficient. The computation cost and memory footprints of the voxel-based models grow cubically with the input resolution, making it memory-prohibitive to scale up the resolution. As for point-based networks, up to 80% of the time is wasted on structuring the sparse data which have rather poor memory locality, not on the actual feature extraction. In this paper, we propose PVCNN that represents the 3D input data in points to reduce the memory consumption, while performing the convolutions in voxels to reduce the irregular, sparse data access and improve the locality. Our PVCNN model is both memory and computation efficient. Evaluated on semantic and part segmentation datasets, it achieves much higher accuracy than the voxel-based baseline with 10x GPU memory reduction; it also outperforms the state-of-the-art point-based models with 7x measured speedup on average. Remarkably, the narrower version of PVCNN achieves 2x speedup over PointNet (an extremely efficient model) on part and scene segmentation benchmarks with much higher accuracy. We validate the general effectiveness of PVCNN on 3D object detection: by replacing the primitives in Frustrum PointNet with PVConv, it outperforms Frustrum PointNet++ by 2.4% mAP on average with 1.5x measured speedup and GPU memory reduction.

updated: Mon Dec 09 2019 20:46:48 GMT+0000 (UTC)

published: Mon Jul 08 2019 17:48:45 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト