Voxel-MAE: Masked Autoencoders for Pre-training Large-scale Point Clouds

Chen Min; Dawei Zhao; Liang Xiao; Yiming Nie; Bin Dai

Voxel-MAE：大規模な点群を事前トレーニングするためのマスクされたオートエンコーダ

マスクベースの事前トレーニングは、手動で注釈を付けた監視なしで、画像、ビデオ、および言語の自己監視学習で大きな成功を収めました。しかし、情報の冗長データとして、3D物体検出の分野ではまだ研究されていません。 3Dオブジェクト検出の点群は大規模であるため、入力点群を再構築することはできません。本論文では、大規模な点群の事前訓練のためのマスクボクセル分類ネットワークを提案した。私たちの重要なアイデアは、点群をボクセル表現に分割し、ボクセルに点群が含まれているかどうかを分類することです。この単純な戦略により、ネットワークはオブジェクトの形状をボクセルで認識できるようになり、3Dオブジェクト検出のパフォーマンスが向上します。広範な実験により、3つの一般的なデータセット（KITTI、Waymo、およびnuScenes）での3Dオブジェクト検出器（SECOND、CenterPoint、およびPV-RCNN）を使用した事前トレーニング済みモデルの優れた効果が示されています。コードはhttps://github.com/chaytonmin/Voxel-MAEで公開されています。

Mask-based pre-training has achieved great success for self-supervised learning in image, video and language, without manually annotated supervision. However, as information redundant data, it has not yet been studied in the field of 3D object detection. As the point clouds in 3D object detection is large-scale, it is impossible to reconstruct the input point clouds. In this paper, we propose a mask voxel classification network for large-scale point clouds pre-training. Our key idea is to divide the point clouds into voxel representations and classify whether the voxel contains point clouds. This simple strategy makes the network to be voxel-aware of the object shape, thus improving the performance of 3D object detection. Extensive experiments show great effectiveness of our pre-trained model with 3D object detectors (SECOND, CenterPoint, and PV-RCNN) on three popular datasets (KITTI, Waymo, and nuScenes). Codes are publicly available at https: //github.com/chaytonmin/Voxel-MAE.

updated: Mon Jun 20 2022 17:15:50 GMT+0000 (UTC)

published: Mon Jun 20 2022 17:15:50 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト