Detailed 3D Human Body Reconstruction from Multi-view Images Combining Voxel Super-Resolution and Learned Implicit Representation

Zhongguo Li; Magnus Oskarsson; Anders Heyden

ボクセル超解像と学習された陰的表現を組み合わせたマルチビュー画像からの詳細な3D人体再構成

画像から詳細な3D人体モデルを再構築するタスクは興味深いものですが、人体の自由度が高いため、コンピュータービジョンでは困難です。この問題に取り組むために、暗黙の表現の学習に基づいてボクセル超解像を組み合わせたマルチビュー画像から詳細な3D人体を再構築する粗い方法から細かい方法を提案します。まず、粗い3Dモデルは、マルチビュー画像からマルチステージ砂時計ネットワークによって抽出されたマルチスケール特徴に基づく陰的表現を学習することによって推定されます。次に、粗い3Dモデルによって生成された低解像度のボクセルグリッドを入力として使用し、暗黙的な表現に基づくボクセルの超解像を、多段3D畳み込みニューラルネットワークを介して学習します。最後に、洗練された詳細な3D人体モデルは、詳細を保持し、粗い3Dモデルの誤った再構成を減らすことができるボクセル超解像によって生成できます。陰的表現の恩恵を受けて、私たちの方法のトレーニングプロセスはメモリ効率が高く、マルチビュー画像から私たちの方法によって生成された詳細な3D人体は、高解像度のジオメトリを使用した連続決定境界です。さらに、ボクセル超解像に基づく粗いものから細かいものへの方法は、誤った再構成を除去し、同時に最終的な再構成で外観の詳細を保持することができます。実験では、私たちの方法は、実際のデータセットと合成データセットの両方でさまざまなポーズと形状の画像から、競争力のある3D人体再構成を定量的および定性的に実現します。

The task of reconstructing detailed 3D human body models from images is interesting but challenging in computer vision due to the high freedom of human bodies. In order to tackle the problem, we propose a coarse-to-fine method to reconstruct a detailed 3D human body from multi-view images combining voxel super-resolution based on learning the implicit representation. Firstly, the coarse 3D models are estimated by learning an implicit representation based on multi-scale features which are extracted by multi-stage hourglass networks from the multi-view images. Then, taking the low resolution voxel grids which are generated by the coarse 3D models as input, the voxel super-resolution based on an implicit representation is learned through a multi-stage 3D convolutional neural network. Finally, the refined detailed 3D human body models can be produced by the voxel super-resolution which can preserve the details and reduce the false reconstruction of the coarse 3D models. Benefiting from the implicit representation, the training process in our method is memory efficient and the detailed 3D human body produced by our method from multi-view images is the continuous decision boundary with high-resolution geometry. In addition, the coarse-to-fine method based on voxel super-resolution can remove false reconstructions and preserve the appearance details in the final reconstruction, simultaneously. In the experiments, our method quantitatively and qualitatively achieves the competitive 3D human body reconstructions from images with various poses and shapes on both the real and synthetic datasets.

updated: Fri Dec 11 2020 08:07:39 GMT+0000 (UTC)

published: Fri Dec 11 2020 08:07:39 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト