Free-Viewpoint RGB-D Human Performance Capture and Rendering

Phong Nguyen; Nikolaos Sarafianos; Christoph Lassner; Janne Heikkila; Tony Tung

フリービューポイントRGB-Dヒューマンパフォーマンスキャプチャおよびレンダリング

斬新な視点から写実的な人間をキャプチャして忠実にレンダリングすることは、AR/VRアプリケーションの基本的な問題です。以前の作業では、実験室の設定で印象的なパフォーマンスキャプチャの結果が示されていますが、特に顔の表情、手、衣服など、目に見えないアイデンティティを忠実に再現するために、カジュアルな自由視点の人間のキャプチャとレンダリングを実現することは簡単ではありません。これらの課題に取り組むために、低コストの深度カメラと同様に、アクター固有のモデルを使用せずに、シングルビューでスパースなRGB-Dセンサーからキャプチャされた人間の見えないビューからリアルなレンダリングを生成する新しいビュー合成フレームワークを導入します。球ベースのニューラルレンダリングによって取得された新しいビューで高密度のフィーチャマップを作成し、グローバルコンテキスト修復モデルを使用して完全なレンダリングを作成するアーキテクチャを提案します。さらに、エンハンサーネットワークは、元のビューから遮られた領域でも全体的な忠実度を活用して、細部まで鮮明なレンダリングを生成します。私たちの方法が、単一ストリームのスパースRGB-D入力が与えられた場合に、合成および実際の人間の俳優の高品質で斬新なビューを生成することを示します。それは目に見えないアイデンティティに一般化し、新しいポーズを取り、顔の表情を忠実に再構築します。私たちのアプローチは、以前のビュー合成方法よりも優れており、さまざまなレベルの深度スパース性に対して堅牢です。

Capturing and faithfully rendering photo-realistic humans from novel views is a fundamental problem for AR/VR applications. While prior work has shown impressive performance capture results in laboratory settings, it is non-trivial to achieve casual free-viewpoint human capture and rendering for unseen identities with high fidelity, especially for facial expressions, hands, and clothes. To tackle these challenges we introduce a novel view synthesis framework that generates realistic renders from unseen views of any human captured from a single-view and sparse RGB-D sensor, similar to a low-cost depth camera, and without actor-specific models. We propose an architecture to create dense feature maps in novel views obtained by sphere-based neural rendering, and create complete renders using a global context inpainting model. Additionally, an enhancer network leverages the overall fidelity, even in occluded areas from the original view, producing crisp renders with fine details. We show that our method generates high-quality novel views of synthetic and real human actors given a single-stream, sparse RGB-D input. It generalizes to unseen identities, and new poses and faithfully reconstructs facial expressions. Our approach outperforms prior view synthesis methods and is robust to different levels of depth sparsity.

updated: Sun Jul 10 2022 14:19:00 GMT+0000 (UTC)

published: Mon Dec 27 2021 20:13:53 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト