Consistent Depth of Moving Objects in Video

Zhoutong Zhang; Forrester Cole; Richard Tucker; William T. Freeman; Tali Dekel

ビデオ内の移動オブジェクトの一貫した深さ

移動カメラで撮影した通常のビデオから、任意の移動物体を含む動的シーンの深度を推定する方法を提示します。この制約の少ない問題に対して、幾何学的および時間的に一貫した解決策を模索します。フレーム全体の対応するポイントの深度予測は、3Dでもっともらしい滑らかな動きを誘発するはずです。この目的は、深度予測CNNが入力ビデオ全体にわたって補助シーンフロー予測MLPと連携してトレーニングされる、新しいテスト時間トレーニングフレームワークで定式化されます。さまざまな時間ステップにわたってシーンフロー予測MLPを再帰的に展開することにより、3Dで直接局所的な滑らかな動きの事前分布を課す短距離シーンフローと、広いベースラインでマルチビュー整合性制約を課す長距離シーンフローの両方を計算します。さまざまな動くオブジェクト（ペット、人、車）やカメラの動きを含むさまざまな挑戦的なビデオで、正確で時間的に一貫した結果を示します。私たちの深度マップは、オブジェクトや照明の挿入など、深度と動きを意識したビデオ編集効果を数多く生み出します。

We present a method to estimate depth of a dynamic scene, containing arbitrary moving objects, from an ordinary video captured with a moving camera. We seek a geometrically and temporally consistent solution to this underconstrained problem: the depth predictions of corresponding points across frames should induce plausible, smooth motion in 3D. We formulate this objective in a new test-time training framework where a depth-prediction CNN is trained in tandem with an auxiliary scene-flow prediction MLP over the entire input video. By recursively unrolling the scene-flow prediction MLP over varying time steps, we compute both short-range scene flow to impose local smooth motion priors directly in 3D, and long-range scene flow to impose multi-view consistency constraints with wide baselines. We demonstrate accurate and temporally coherent results on a variety of challenging videos containing diverse moving objects (pets, people, cars), as well as camera motion. Our depth maps give rise to a number of depth-and-motion aware video editing effects such as object and lighting insertion.

updated: Mon Aug 02 2021 20:53:18 GMT+0000 (UTC)

published: Mon Aug 02 2021 20:53:18 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト