MVTN: Learning Multi-View Transformations for 3D Understanding

Abdullah Hamdi; Faisal AlZahrani; Silvio Giancola; Bernard Ghanem

MVTN: 3D 理解のためのマルチビュー変換の学習

マルチビュー投影技術は、3D 形状の認識において最高の結果を達成するのに非常に効果的であることが示されています。これらの方法には、複数の視点からの情報を組み合わせる方法を学ぶことが含まれます。ただし、これらのビューが取得されるカメラの視点は、多くの場合、すべての形状に対して固定されています。現在のマルチビュー技術の静的な性質を克服するために、これらの視点を学習することを提案します。具体的には、微分可能なレンダリングを使用して 3D 形状認識に最適な視点を決定する Multi-View Transformation Network (MVTN) を紹介します。その結果、MVTN は、3D 形状分類用の任意のマルチビューネットワークでエンドツーエンドでトレーニングできます。 MVTN を、3D メッシュと点群の両方をレンダリングできる新しい適応型マルチビューパイプラインに統合します。私たちのアプローチは、いくつかのベンチマーク (ModelNet40、ScanObjectNN、ShapeNet Core55) での 3D 分類と形状検索における最先端のパフォーマンスを示しています。さらなる分析は、私たちのアプローチが他の方法と比較してオクルージョンに対する堅牢性を向上させていることを示しています。また、2D 事前トレーニングやセグメンテーションへの使用など、MVTN の追加の側面についても調査します。この分野のさらなる研究をサポートするために、マルチビュープロジェクションを使用した 3D の理解と生成のための PyTorch ライブラリである MVTorch をリリースしました。

Multi-view projection techniques have shown themselves to be highly effective in achieving top-performing results in the recognition of 3D shapes. These methods involve learning how to combine information from multiple view-points. However, the camera view-points from which these views are obtained are often fixed for all shapes. To overcome the static nature of current multi-view techniques, we propose learning these view-points. Specifically, we introduce the Multi-View Transformation Network (MVTN), which uses differentiable rendering to determine optimal view-points for 3D shape recognition. As a result, MVTN can be trained end-to-end with any multi-view network for 3D shape classification. We integrate MVTN into a novel adaptive multi-view pipeline that is capable of rendering both 3D meshes and point clouds. Our approach demonstrates state-of-the-art performance in 3D classification and shape retrieval on several benchmarks (ModelNet40, ScanObjectNN, ShapeNet Core55). Further analysis indicates that our approach exhibits improved robustness to occlusion compared to other methods. We also investigate additional aspects of MVTN, such as 2D pretraining and its use for segmentation. To support further research in this area, we have released MVTorch, a PyTorch library for 3D understanding and generation using multi-view projections.

updated: Thu Jun 06 2024 15:12:31 GMT+0000 (UTC)

published: Tue Dec 27 2022 12:09:16 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト