Voxel-MAE: Masked Autoencoders for Pre-training Large-scale Point Clouds

Chen Min; Xinli Xu; Dawei Zhao; Liang Xiao; Yiming Nie; Bin Dai

Voxel-MAE: 大規模な点群を事前トレーニングするためのマスクされたオートエンコーダー

マスクベースの事前トレーニングは、手動で注釈を付けた監督なしで、画像と言語の自己監督学習で大きな成功を収めました。ただし、冗長な空間情報を持つ大規模な点群についてはまだ研究されていません。この研究では、Voxel-MAE と呼ばれる大規模な点群を事前トレーニングするためのマスクボクセルオートエンコーダネットワークを提案します。私たちの重要なアイデアは、点群をボクセル表現に変換し、ボクセルに点群が含まれているかどうかを分類することです。このシンプルだが効果的な戦略により、ネットワークはオブジェクトの形状をボクセル認識できるようになり、3D オブジェクト検出などのダウンストリームタスクのパフォーマンスが向上します。ボクセル-MAE は、マスキング率が 90% であっても、大規模な点群の高い空間冗長性の代表的な特徴を学習できます。また、教師なしドメイン適応タスクに対する Voxel-MAE の有効性を検証し、Voxel-MAE の一般化能力を証明します。 Voxel-MAE は、自動運転車の認識能力を高めるために、データ注釈なしで大規模な点群を事前にトレーニングできることを証明しています。広範な実験により、3 つの一般的なデータセット (KITTI、Waymo、および nuScenes) に対する 3D オブジェクト検出器 (SECOND、CenterPoint、および PV-RCNN) を使用した事前トレーニング方法の大きな有効性が示されています。

Mask-based pre-training has achieved great success for self-supervised learning in images and languages without manually annotated supervision. However, it has not yet been studied for large-scale point clouds with redundant spatial information. In this research, we propose a mask voxel autoencoder network for pre-training large-scale point clouds, dubbed Voxel-MAE. Our key idea is to transform the point clouds into voxel representations and classify whether the voxel contains point clouds. This simple but effective strategy makes the network voxel-aware of the object shape, thus improving the performance of downstream tasks, such as 3D object detection. Our Voxel-MAE, with even a 90% masking ratio, can still learn representative features for the high spatial redundancy of large-scale point clouds. We also validate the effectiveness of Voxel-MAE on unsupervised domain adaptative tasks, which proves the generalization ability of Voxel-MAE. Our Voxel-MAE proves that it is feasible to pre-train large-scale point clouds without data annotations to enhance the perception ability of the autonomous vehicle. Extensive experiments show great effectiveness of our pre-training method with 3D object detectors (SECOND, CenterPoint, and PV-RCNN) on three popular datasets (KITTI, Waymo, and nuScenes).

updated: Tue Aug 16 2022 14:16:21 GMT+0000 (UTC)

published: Mon Jun 20 2022 17:15:50 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト