Multi-view Vision-Prompt Fusion Network: Can 2D Pre-trained Model Boost 3D Point Cloud Data-scarce Learning?

Haoyang Peng; Baopu Li; Bo Zhang; Xin Chen; Tao Chen; Hongyuan Zhu

マルチビュービジョンプロンプトフュージョンネットワーク: 2D 事前トレーニング済みモデルは 3D ポイントクラウドデータ不足の学習を促進できますか?

点群ベースの 3D ディープモデルは、自動運転やハウスロボットなど、多くのアプリケーションで幅広い用途があります。自然言語処理における最近の迅速な学習に触発されて、この作業は、少数ショット 3D 点群分類のための新しいマルチビュービジョンプロンプトフュージョンネットワーク (MvNet) を提案します。 MvNet は、大規模な注釈付き 3D 点群データに対する既存のベースラインモデルの過度の依存の問題を軽減できる、数ショット分類を達成するために既製の 2D 事前トレーニング済みモデルを活用する可能性を調査します。具体的には、MvNet は最初に 3D 点群を多数の異なるビューのマルチビュー画像特徴にエンコードします。次に、新しいマルチビュープロンプトフュージョンモジュールが開発され、異なるビューからの情報を効果的に融合して、3D ポイントクラウドデータと 2D 事前トレーニング済みモデルの間のギャップを埋めます。その後、一連の 2D 画像プロンプトを導出して、少数ショットの 3D 点群分類用の大規模な事前トレーニング済み画像モデルに適した事前知識をより適切に説明できます。 ModelNet、ScanObjectNN、および ShapeNet データセットでの広範な実験により、MvNet が 3D 少数ショット点群画像分類の新しい最先端のパフォーマンスを実現することが実証されました。この作品のソースコードは近日公開予定です。

Point cloud based 3D deep model has wide applications in many applications such as autonomous driving, house robot, and so on. Inspired by the recent prompt learning in natural language processing, this work proposes a novel Multi-view Vision-Prompt Fusion Network (MvNet) for few-shot 3D point cloud classification. MvNet investigates the possibility of leveraging the off-the-shelf 2D pre-trained models to achieve the few-shot classification, which can alleviate the over-dependence issue of the existing baseline models towards the large-scale annotated 3D point cloud data. Specifically, MvNet first encodes a 3D point cloud into multi-view image features for a number of different views. Then, a novel multi-view prompt fusion module is developed to effectively fuse information from different views to bridge the gap between 3D point cloud data and 2D pre-trained models. A set of 2D image prompts can then be derived to better describe the suitable prior knowledge for a large-scale pre-trained image model for few-shot 3D point cloud classification. Extensive experiments on ModelNet, ScanObjectNN, and ShapeNet datasets demonstrate that MvNet achieves new state-of-the-art performance for 3D few-shot point cloud image classification. The source code of this work will be available soon.

updated: Thu Apr 20 2023 11:39:41 GMT+0000 (UTC)

published: Thu Apr 20 2023 11:39:41 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト