ViPFormer: Efficient Vision-and-Pointcloud Transformer for Unsupervised Pointcloud Understanding

Hongyu Sun; Yongcai Wang; Xudong Cai; Xuewei Bai; Deying Li

ViPFormer: 教師なしで点群を理解するための効率的なビジョンと点群のトランスフォーマー

最近、高価な手動注釈の制限と教師ありメソッドの転送可能性の低さを軽減するために、点群処理のための教師なしパラダイムを設計する作業が増えています。その中でも、CrossPoint は対照的な学習フレームワークに従い、教師なしで点群を理解するために画像と点群データを活用します。有望なパフォーマンスが示されていますが、バランスの取れていないアーキテクチャにより、必要以上に複雑で非効率的になっています。たとえば、CrossPoint の画像ブランチは、ポイントクラウドブランチよりも約 8.3 倍重いため、複雑さとレイテンシが高くなります。この問題に対処するために、このホワイトペーパーでは、軽量の Vision-and-Pointcloud Transformer (ViPFormer) を提案して、画像と点群の処理を単一のアーキテクチャに統合します。 ViPFormer は、イントラモーダルおよびクロスモーダルの対照的な目的を最適化することにより、教師なしで学習します。次に、事前トレーニング済みのモデルは、3D 形状分類やセマンティックセグメンテーションなど、さまざまなダウンストリームタスクに転送されます。さまざまなデータセットでの実験では、ViPFormer が以前の最先端の教師なしメソッドよりも精度が高く、モデルの複雑さが低く、ランタイムレイテンシが低いことが示されています。最後に、ViPFormer の各コンポーネントの有効性は、広範なアブレーション研究によって検証されています。提案された方法の実装は、https://github.com/auniquesun/ViPFormer で入手できます。

Recently, a growing number of work design unsupervised paradigms for point cloud processing to alleviate the limitation of expensive manual annotation and poor transferability of supervised methods. Among them, CrossPoint follows the contrastive learning framework and exploits image and point cloud data for unsupervised point cloud understanding. Although the promising performance is presented, the unbalanced architecture makes it unnecessarily complex and inefficient. For example, the image branch in CrossPoint is ∼8.3x heavier than the point cloud branch leading to higher complexity and latency. To address this problem, in this paper, we propose a lightweight Vision-and-Pointcloud Transformer (ViPFormer) to unify image and point cloud processing in a single architecture. ViPFormer learns in an unsupervised manner by optimizing intra-modal and cross-modal contrastive objectives. Then the pretrained model is transferred to various downstream tasks, including 3D shape classification and semantic segmentation. Experiments on different datasets show ViPFormer surpasses previous state-of-the-art unsupervised methods with higher accuracy, lower model complexity and runtime latency. Finally, the effectiveness of each component in ViPFormer is validated by extensive ablation studies. The implementation of the proposed method is available at https://github.com/auniquesun/ViPFormer.

updated: Sat Mar 25 2023 06:47:12 GMT+0000 (UTC)

published: Sat Mar 25 2023 06:47:12 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト