Neural 3D Video Synthesis

Tianye Li; Mira Slavcheva; Michael Zollhoefer; Simon Green; Christoph Lassner; Changil Kim; Tanner Schmidt; Steven Lovegrove; Michael Goesele; Zhaoyang Lv

ニューラル3Dビデオ合成

ダイナミックな実世界のシーンのマルチビュービデオ録画をコンパクトでありながら表現力豊かな表現で表現できる3Dビデオ合成の新しいアプローチを提案します。これにより、高品質のビュー合成とモーション補間が可能になります。私たちのアプローチは、静的な神経放射輝度フィールドの高品質とコンパクトさを新しい方向に導きます。モデルのない動的な設定です。私たちのアプローチの中核は、コンパクトな潜在コードのセットを使用してシーンのダイナミクスを表す、新しい時間調整された神経放射輝度フィールドです。ビデオの隣接するフレーム間の変化が通常小さく、局所的に一貫しているという事実を活用するために、ニューラルネットワークの効率的なトレーニングのための2つの新しい戦略を提案します：1）効率的な階層トレーニングスキーム、および2）選択する重要度サンプリング戦略入力ビデオの時間的変化に基づくトレーニングのための次の光線。これら2つの戦略を組み合わせることで、トレーニング速度が大幅に向上し、トレーニングプロセスの収束が速くなり、高品質の結果が得られます。私たちが学んだ表現は非常にコンパクトで、モデルサイズがわずか28MBの18台のカメラで10秒30FPSのマルチビュービデオ録画を表現できます。私たちの方法は、非常に複雑でダイナミックなシーンでも、1Kを超える解像度で忠実度の高い広角の斬新なビューをレンダリングできることを示しています。私たちは、私たちのアプローチが現在の最先端技術を上回っていることを示す、広範な定性的および定量的評価を実施します。 https://neural-3d-video.github.io/で追加のビデオと情報を含めます

We propose a novel approach for 3D video synthesis that is able to represent multi-view video recordings of a dynamic real-world scene in a compact, yet expressive representation that enables high-quality view synthesis and motion interpolation. Our approach takes the high quality and compactness of static neural radiance fields in a new direction: to a model-free, dynamic setting. At the core of our approach is a novel time-conditioned neural radiance fields that represents scene dynamics using a set of compact latent codes. To exploit the fact that changes between adjacent frames of a video are typically small and locally consistent, we propose two novel strategies for efficient training of our neural network: 1) An efficient hierarchical training scheme, and 2) an importance sampling strategy that selects the next rays for training based on the temporal variation of the input videos. In combination, these two strategies significantly boost the training speed, lead to fast convergence of the training process, and enable high quality results. Our learned representation is highly compact and able to represent a 10 second 30 FPS multi-view video recording by 18 cameras with a model size of just 28MB. We demonstrate that our method can render high-fidelity wide-angle novel views at over 1K resolution, even for highly complex and dynamic scenes. We perform an extensive qualitative and quantitative evaluation that shows that our approach outperforms the current state of the art. We include additional video and information at: https://neural-3d-video.github.io/

updated: Wed Mar 03 2021 18:47:40 GMT+0000 (UTC)

published: Wed Mar 03 2021 18:47:40 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト