3D Feature Prediction for Masked-AutoEncoder-Based Point Cloud Pretraining

Siming Yan; Yuqi Yang; Yuxiao Guo; Hao Pan; Peng-shuai Wang; Xin Tong; Yang Liu; Qixing Huang

マスクされた AutoEncoder ベースの点群事前トレーニングのための 3D 特徴予測

マスクオートエンコーダー (MAE) は、NLP とコンピュータービジョンで大きな成功を収めたため、点群の 3D 自己教師あり事前トレーニングに最近導入されました。画像ドメインで使用される MAE とは異なり、マスクされたピクセルの特徴 (色など) を復元するという口実にタスクが行われますが、既存の 3D MAE は、失われたジオメトリ、つまりマスクされたポイントの位置のみを再構築します。以前の研究とは対照的に、ポイント位置の回復は不必要であり、固有のポイント機能の復元ははるかに優れていることを提唱します。この目的のために、ポイント位置の再構築を無視し、エンコーダの設計とは無関係の新しい注意ベースのデコーダを通じて、表面法線と表面の変化を含むマスクされたポイントで高次の特徴を回復することを提案します。 3D トレーニングにさまざまなエンコーダー構造を使用して、口実タスクとデコーダー設計の有効性を検証し、さまざまなポイントクラウド分析タスクで事前トレーニング済みネットワークの利点を実証します。

Masked autoencoders (MAE) have recently been introduced to 3D self-supervised pretraining for point clouds due to their great success in NLP and computer vision. Unlike MAEs used in the image domain, where the pretext task is to restore features at the masked pixels, such as colors, the existing 3D MAE works reconstruct the missing geometry only, i.e, the location of the masked points. In contrast to previous studies, we advocate that point location recovery is inessential and restoring intrinsic point features is much superior. To this end, we propose to ignore point position reconstruction and recover high-order features at masked points including surface normals and surface variations, through a novel attention-based decoder which is independent of the encoder design. We validate the effectiveness of our pretext task and decoder design using different encoder structures for 3D training and demonstrate the advantages of our pretrained networks on various point cloud analysis tasks.

updated: Sun Apr 28 2024 18:36:19 GMT+0000 (UTC)

published: Fri Apr 14 2023 03:25:24 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト