Monocular Dynamic View Synthesis: A Reality Check

Hang Gao; Ruilong Li; Shubham Tulsiani; Bryan Russell; Angjoo Kanazawa

単眼ダイナミックビューの合成: リアリティチェック

単眼ビデオからの動的ビュー合成 (DVS) の最近の進歩を研究します。既存のアプローチは印象的な結果を示していますが、実際のキャプチャプロセスと既存の実験プロトコルとの間に矛盾があり、トレーニング中にマルチビュー信号が効果的に漏れます。効果的なマルチビューファクター (EMF) を定義して、相対的なカメラシーンの動きに基づいて、入力キャプチャシーケンスに存在するマルチビュー信号の量を定量化します。既存のプロトコルの問題を克服する、共可視性マスク画像メトリックと対応精度の 2 つの新しいメトリックを導入します。また、より多様な実際の変形シーケンスを含む新しい iPhone データセットも提案します。提案された実験プロトコルを使用して、最先端のアプローチでは、マルチビューキューがない場合にマスクされた PSNR で 1 ～ 2 dB の低下が観察され、複雑な動きをモデル化すると 4 ～ 5 dB の低下が観察されることがわかります。コードとデータは https://hangg7.com/dycheck にあります。

We study the recent progress on dynamic view synthesis (DVS) from monocular video. Though existing approaches have demonstrated impressive results, we show a discrepancy between the practical capture process and the existing experimental protocols, which effectively leaks in multi-view signals during training. We define effective multi-view factors (EMFs) to quantify the amount of multi-view signal present in the input capture sequence based on the relative camera-scene motion. We introduce two new metrics: co-visibility masked image metrics and correspondence accuracy, which overcome the issue in existing protocols. We also propose a new iPhone dataset that includes more diverse real-life deformation sequences. Using our proposed experimental protocol, we show that the state-of-the-art approaches observe a 1-2 dB drop in masked PSNR in the absence of multi-view cues and 4-5 dB drop when modeling complex motion. Code and data can be found at https://hangg7.com/dycheck.

updated: Mon Oct 24 2022 17:58:28 GMT+0000 (UTC)

published: Mon Oct 24 2022 17:58:28 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト