Look Outside the Room: Synthesizing A Consistent Long-Term 3D Scene Video from A Single Image

Xuanchi Ren; Xiaolong Wang

部屋の外を見る：単一の画像から一貫性のある長期3Dシーンビデオを合成する

最近、単一画像からの新しいビュー合成が大きな注目を集めており、主に3Dディープラーニングとレンダリング技術によって進歩しています。ただし、ほとんどの作業は、比較的小さなカメラの動きの中で新しいビューを合成することによって依然として制限されています。この論文では、単一のシーン画像と大きなカメラの動きの軌跡を前提として、一貫性のある長期ビデオを合成するための新しいアプローチを提案します。私たちのアプローチでは、自己回帰トランスフォーマーを使用して複数のフレームのシーケンシャルモデリングを実行します。これにより、複数のフレームと対応するカメラの関係が次のフレームを予測します。学習を容易にし、生成されたフレーム間の一貫性を確保するために、入力カメラに基づく局所性制約を導入して、空間と時間にわたる多数のパッチ間の自己注意を導きます。私たちの方法は、特に屋内の3Dシーンで長期的な未来を合成する場合に、最先端のビュー合成アプローチを大幅に上回ります。 https://xrenaa.github.io/look-outside-room/のプロジェクトページ。

Novel view synthesis from a single image has recently attracted a lot of attention, and it has been primarily advanced by 3D deep learning and rendering techniques. However, most work is still limited by synthesizing new views within relatively small camera motions. In this paper, we propose a novel approach to synthesize a consistent long-term video given a single scene image and a trajectory of large camera motions. Our approach utilizes an autoregressive Transformer to perform sequential modeling of multiple frames, which reasons the relations between multiple frames and the corresponding cameras to predict the next frame. To facilitate learning and ensure consistency among generated frames, we introduce a locality constraint based on the input cameras to guide self-attention among a large number of patches across space and time. Our method outperforms state-of-the-art view synthesis approaches by a large margin, especially when synthesizing long-term future in indoor 3D scenes. Project page at https://xrenaa.github.io/look-outside-room/.

updated: Thu Mar 17 2022 17:16:16 GMT+0000 (UTC)

published: Thu Mar 17 2022 17:16:16 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト