HDhuman: High-quality Human Performance Capture with Sparse Views

Tiansong Zhou; Tao Yu; Ruizhi Shao; Kun Li

HDhuman：スパースビューを使用した高品質のヒューマンパフォーマンスキャプチャ

この論文では、HDhumanを紹介します。これは、カメラビューのまばらなセットを使用して、複雑なテクスチャパターンの服を着ている人間のパフォーマーの斬新なビューレンダリングの課題に対処する方法です。最近のいくつかの作品は、スパースビューを使用して比較的均一なテクスチャを持つ人間で優れたレンダリング品質を達成していますが、入力ビューで観察された高周波ジオメトリの詳細を復元できないため、複雑なテクスチャパターンを処理する場合のレンダリング品質は制限されたままです。この目的のために、提案されたHDhumanは、ピクセル整列空間トランスフォーマーを備えた人間再構成ネットワークと、高品質の人間再構成とレンダリングを実現するためにジオメトリガイドピクセルワイズ機能統合を使用するレンダリングネットワークを使用します。設計されたピクセル整列空間トランスフォーマーは、入力ビュー間の相関を計算し、高周波の詳細を含む人間の再構成結果を生成します。表面再構成の結果に基づいて、ジオメトリに基づくピクセル単位の可視性推論は、マルチビュー機能統合のガイダンスを提供し、レンダリングネットワークが新しいビューで2k解像度で高品質の画像をレンダリングできるようにします。異なるシーンのために独立したネットワークを常にトレーニングまたは微調整する必要がある以前のニューラルレンダリング作業とは異なり、私たちの方法は、新しい主題に一般化できる一般的なフレームワークです。実験によると、私たちのアプローチは、合成データと実世界のデータの両方で、以前のすべての一般的または特定の方法よりも優れています。

In this paper, we introduce HDhuman, a method that addresses the challenge of novel view rendering of human performers that wear clothes with complex texture patterns using a sparse set of camera views. Although some recent works have achieved remarkable rendering quality on humans with relatively uniform textures using sparse views, the rendering quality remains limited when dealing with complex texture patterns as they are unable to recover the high-frequency geometry details that observed in the input views. To this end, the proposed HDhuman uses a human reconstruction network with a pixel-aligned spatial transformer and a rendering network that uses geometry-guided pixel-wise feature integration to achieve high-quality human reconstruction and rendering. The designed pixel-aligned spatial transformer calculates the correlations between the input views, producing human reconstruction results with high-frequency details. Based on the surface reconstruction results, the geometry-guided pixel-wise visibility reasoning provides guidance for multi-view feature integration, enabling the rendering network to render high-quality images at 2k resolution on novel views. Unlike previous neural rendering works that always need to train or fine-tune an independent network for a different scene, our method is a general framework that is able to generalize to novel subjects. Experiments show that our approach outperforms all the prior generic or specific methods on both synthetic data and real-world data.

updated: Mon Jan 24 2022 12:49:11 GMT+0000 (UTC)

published: Thu Jan 20 2022 13:04:59 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト