Unsupervised Learning of Fine Structure Generation for 3D Point Clouds by 2D Projection Matching

Chen Chao; Zhizhong Han; Yu-Shen Liu; Matthias Zwicker

2D射影マッチングによる3D点群の微細構造生成の教師なし学習

3D監視なしで3D点群を生成することを学ぶことは重要ですが、挑戦的な問題です。現在のソリューションは、さまざまな微分可能なレンダラーを活用して、生成された3D点群を2D画像平面に投影し、2Dグラウンドトゥルース画像とのピクセルごとの違いを使用してディープニューラルネットワークをトレーニングします。ただし、これらのソリューションは、細いチューブや平面など、3D形状の微細構造を完全に復元するのにまだ苦労しています。この問題を解決するために、微細構造を使用した3D点群生成のための監視されていないアプローチを提案します。具体的には、3D点群学習を2D射影マッチング問題としてキャストします。 2Dシルエット画像全体を通常のピクセル監視として使用するのではなく、構造適応サンプリングを導入して、シルエット内の2Dポイントを不規則なポイント監視としてランダムにサンプリングします。これにより、さまざまなビュー角度からのサンプリングの一貫性の問題が軽減されます。私たちの方法は、ニューラルネットワークをプッシュして、2D投影がさまざまな視野角からの不規則な点の監視と一致する3D点群を生成します。私たちの2D投影マッチングアプローチにより、ニューラルネットワークは、特に細かい3D構造と薄い3D構造の場合、ピクセルごとの差を使用するよりも正確な構造情報を学習できます。私たちの方法は、さまざまな解像度で2Dシルエット画像から微細な3D構造を復元でき、不規則なポイント監視におけるさまざまなサンプリング方法とポイント数に対して堅牢です。私たちの方法は、広く使用されているベンチマークの下で他の方法よりも優れています。コード、データ、モデルはhttps://github.com/chenchao15/2D\_projection\_matchingで入手できます。

Learning to generate 3D point clouds without 3D supervision is an important but challenging problem. Current solutions leverage various differentiable renderers to project the generated 3D point clouds onto a 2D image plane, and train deep neural networks using the per-pixel difference with 2D ground truth images. However, these solutions are still struggling to fully recover fine structures of 3D shapes, such as thin tubes or planes. To resolve this issue, we propose an unsupervised approach for 3D point cloud generation with fine structures. Specifically, we cast 3D point cloud learning as a 2D projection matching problem. Rather than using entire 2D silhouette images as a regular pixel supervision, we introduce structure adaptive sampling to randomly sample 2D points within the silhouettes as an irregular point supervision, which alleviates the consistency issue of sampling from different view angles. Our method pushes the neural network to generate a 3D point cloud whose 2D projections match the irregular point supervision from different view angles. Our 2D projection matching approach enables the neural network to learn more accurate structure information than using the per-pixel difference, especially for fine and thin 3D structures. Our method can recover fine 3D structures from 2D silhouette images at different resolutions, and is robust to different sampling methods and point number in irregular point supervision. Our method outperforms others under widely used benchmarks. Our code, data and models are available at https://github.com/chenchao15/2D\_projection\_matching.

updated: Sun Aug 08 2021 22:15:31 GMT+0000 (UTC)

published: Sun Aug 08 2021 22:15:31 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト