Image2Point: 3D Point-Cloud Understanding with Pretrained 2D ConvNets

Chenfeng Xu; Shijia Yang; Tomer Galanti; Bichen Wu; Xiangyu Yue; Bohan Zhai; Wei Zhan; Peter Vajda; Kurt Keutzer; Masayoshi Tomizuka

Image2Point：事前トレーニング済みの2DConvNetを使用した3Dポイントクラウドの理解

3D点群と2D画像は、物理的な世界の異なる視覚的表現です。人間の視覚は両方の表現を理解できますが、2D画像と3D点群の理解のために設計されたコンピュータービジョンモデルはまったく異なります。私たちの論文は、転送の実現可能性、転送の利点を経験的に調査し、転送が機能する理由に光を当てることによって、3Dポイントクラウドを理解するために2Dモデルアーキテクチャと重みを転送する可能性を探ります。ニューラルネットモデルと同じアーキテクチャと事前トレーニングされた重みを実際に使用して、画像と点群の両方を理解できることを発見しました。具体的には、重みをコピーまたは膨張させることにより、画像で事前トレーニングされたモデルを点群モデルに転送します。入力、出力、および正規化レイヤーでのみ、最小限の労力で変換された画像事前トレーニングモデル（FIP）を微調整することで、3D点群分類で競争力のあるパフォーマンスを達成し、さまざまな点群モデルを打ち負かすことができます。タスク固有のアーキテクチャを採用し、さまざまなトリックを使用します。モデル全体を微調整すると、パフォーマンスがさらに向上します。一方、FIPはデータ効率を改善し、数ショット分類で最大10.0のトップ1精度パーセントに達します。また、ターゲットの精度（たとえば、90％の精度）で点群モデルのトレーニングを最大11.1倍高速化します。最後に、神経崩壊の観点から、画像から点群への転送について説明します。コードはhttps://github.com/chenfengxu714/image2pointで入手できます。

3D point-clouds and 2D images are different visual representations of the physical world. While human vision can understand both representations, computer vision models designed for 2D image and 3D point-cloud understanding are quite different. Our paper explores the potential of transferring 2D model architectures and weights to understand 3D point-clouds, by empirically investigating the feasibility of the transfer, the benefits of the transfer, and shedding light on why the transfer works. We discover that we can indeed use the same architecture and pretrained weights of a neural net model to understand both images and point-clouds. Specifically, we transfer the image-pretrained model to a point-cloud model by copying or inflating the weights. We find that finetuning the transformed image-pretrained models (FIP) with minimal efforts -- only on input, output, and normalization layers -- can achieve competitive performance on 3D point-cloud classification, beating a wide range of point-cloud models that adopt task-specific architectures and use a variety of tricks. When finetuning the whole model, the performance improves even further. Meanwhile, FIP improves data efficiency, reaching up to 10.0 top-1 accuracy percent on few-shot classification. It also speeds up the training of point-cloud models by up to 11.1x for a target accuracy (e.g., 90 % accuracy). Lastly, we provide an explanation of the image to point-cloud transfer from the aspect of neural collapse. The code is available at: https://github.com/chenfengxu714/image2point.

updated: Thu Apr 21 2022 08:30:25 GMT+0000 (UTC)

published: Tue Jun 08 2021 08:42:55 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト