Frozen CLIP Model is Efficient Point Cloud Backbone

Xiaoshui Huang; Sheng Li; Wentao Qu; Tong He; Yifan Zuo; Wanli Ouyang

凍結されたCLIPモデルは効率的な点群バックボーンです

事前トレーニングと微調整のパラダイムは、事前トレーニング済みモデルの高品質の表現能力と転送可能性により、NLP および 2D 画像分野で大きな成功を収めています。ただし、トレーニングデータが限られており、ポイントクラウドの収集に費用がかかるため、3D ポイントクラウド分野でこのような強力なモデルを事前トレーニングすることは困難です。このホワイトペーパーでは、凍結された CLIP モデルを使用して高品質の点群モデルを直接トレーニングするための効果的かつ効率的な点群学習器である Efficient Point Cloud Learning (EPCL) を紹介します。当社の EPCL は、2D と 3D のデータをペアにすることなく、2D の特徴と点群の特徴を意味的に整列させることで、2D と 3D のモダリティを結び付けます。具体的には、入力ポイントクラウドは一連のトークンに分割され、ポイントクラウド表現を学習するために凍結された CLIP モデルに直接供給されます。さらに、2D 画像と 3D 点群の間のギャップを狭めるタスクトークンを設計します。 3D 検出、セマンティックセグメンテーション、分類、および少数ショット学習に関する包括的な実験は、2D CLIP モデルが効率的なポイントクラウドバックボーンになり得ることを実証し、私たちの方法は、現実世界と合成ダウンストリームタスクの両方で最先端の精度を達成します。コードが利用可能になります。

The pretraining-finetuning paradigm has demonstrated great success in NLP and 2D image fields because of the high-quality representation ability and transferability of their pretrained models. However, pretraining such a strong model is difficult in the 3D point cloud field since the training data is limited and point cloud collection is expensive. This paper introduces Efficient Point Cloud Learning (EPCL), an effective and efficient point cloud learner for directly training high-quality point cloud models with a frozen CLIP model. Our EPCL connects the 2D and 3D modalities by semantically aligning the 2D features and point cloud features without paired 2D-3D data. Specifically, the input point cloud is divided into a sequence of tokens and directly fed into the frozen CLIP model to learn point cloud representation. Furthermore, we design a task token to narrow the gap between 2D images and 3D point clouds. Comprehensive experiments on 3D detection, semantic segmentation, classification and few-shot learning demonstrate that the 2D CLIP model can be an efficient point cloud backbone and our method achieves state-of-the-art accuracy on both real-world and synthetic downstream tasks. Code will be available.

updated: Thu Dec 08 2022 06:27:11 GMT+0000 (UTC)

published: Thu Dec 08 2022 06:27:11 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト