PointMCD: Boosting Deep Point Cloud Encoders via Multi-view Cross-modal Distillation for 3D Shape Recognition

Qijian Zhang; Junhui Hou; Yue Qian

PointMCD: 3D 形状認識のためのマルチビュークロスモーダル蒸留によるディープポイントクラウドエンコーダーのブースト

3D オブジェクトの 2 つの基本的な表現モダリティとして、3D ポイントクラウドとマルチビュー 2D 画像は、幾何学的構造と視覚的外観のさまざまなドメインから形状情報を記録します。現在の深層学習の時代では、互換性のある 3D および 2D ネットワークアーキテクチャをそれぞれカスタマイズすることで、このような 2 つのデータモダリティの処理が目覚ましい進歩を遂げています。ただし、いくつかの一般的な 3D 形状認識ベンチマークで優れたパフォーマンスを示した多視点画像ベースの 2D ビジュアルモデリングパラダイムとは異なり、点群ベースの 3D 幾何学的モデリングパラダイムは、抽出が難しいため、学習能力が不十分であるため、依然として非常に制限されています。不規則な幾何学的信号からの識別機能。この論文では、標準的な教師と生徒の蒸留ワークフローの下で、ディープ 2D 画像エンコーダーから抽出された視覚的知識を転送することにより、ディープ 3D ポイントクラウドエンコーダーを強化する可能性を探ります。一般に、統合されたマルチビューのクロスモーダル蒸留アーキテクチャである PointMCD を提案します。これには、教師としての事前トレーニング済みのディープイメージエンコーダーと生徒としてのディープポイントエンコーダーが含まれます。 2D ビジュアルドメインと 3D ジオメトリックドメインの間で異種のフィーチャアラインメントを実行するために、可視性を考慮したフィーチャプロジェクション (VAFP) をさらに調査します。これにより、ポイントごとの埋め込みがビュー固有のジオメトリ記述子に合理的に集約されます。マルチビューのビジュアルディスクリプタとジオメトリックディスクリプタをペアで整列させることにより、複雑なネットワーク変更を使い尽くすことなく、より強力なディープポイントエンコーダを取得できます。 3D 形状分類、パーツセグメンテーション、および教師なし学習に関する実験により、この方法の有効性が強く検証されます。コードとデータは、https://github.com/keeganhk/PointMCD で公開されます。

As two fundamental representation modalities of 3D objects, 3D point clouds and multi-view 2D images record shape information from different domains of geometric structures and visual appearances. In the current deep learning era, remarkable progress in processing such two data modalities has been achieved through respectively customizing compatible 3D and 2D network architectures. However, unlike multi-view image-based 2D visual modeling paradigms, which have shown leading performance in several common 3D shape recognition benchmarks, point cloud-based 3D geometric modeling paradigms are still highly limited by insufficient learning capacity, due to the difficulty of extracting discriminative features from irregular geometric signals. In this paper, we explore the possibility of boosting deep 3D point cloud encoders by transferring visual knowledge extracted from deep 2D image encoders under a standard teacher-student distillation workflow. Generally, we propose PointMCD, a unified multi-view cross-modal distillation architecture, including a pretrained deep image encoder as the teacher and a deep point encoder as the student. To perform heterogeneous feature alignment between 2D visual and 3D geometric domains, we further investigate visibility-aware feature projection (VAFP), by which point-wise embeddings are reasonably aggregated into view-specific geometric descriptors. By pair-wisely aligning multi-view visual and geometric descriptors, we can obtain more powerful deep point encoders without exhausting and complicated network modification. Experiments on 3D shape classification, part segmentation, and unsupervised learning strongly validate the effectiveness of our method. The code and data will be publicly available at https://github.com/keeganhk/PointMCD.

updated: Thu Jun 15 2023 06:21:09 GMT+0000 (UTC)

published: Thu Jul 07 2022 07:23:20 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト