LDMVFI: Video Frame Interpolation with Latent Diffusion Models

Duolikun Danier; Fan Zhang; David Bull

LDMVFI: 潜在拡散モデルによるビデオフレーム補間

ビデオフレーム補間 (VFI) に関する既存の研究では、ほとんどの場合、出力とグラウンドトゥルースフレーム間の L1 または L2 距離を最小化するようにトレーニングされたディープニューラルネットワークが採用されています。最近の進歩にもかかわらず、既存の VFI メソッドは、特に大きなモーションや動的テクスチャなどの困難なシナリオで、知覚的に劣った結果を生成する傾向があります。知覚指向の VFI 手法の開発に向けて、潜在拡散モデルベースの VFI、LDMVFI を提案します。これは、VFI 問題を条件付き生成問題として定式化することにより、生成的な観点から VFI 問題にアプローチします。潜在拡散モデルを使用して VFI に対処する最初の取り組みとして、既存の VFI 文献で採用されている一般的な評価プロトコルに従って、この方法を厳密にベンチマークします。私たちの定量的実験とユーザー調査は、LDMVFI が、高解像度領域であっても、最先端技術と比較して優れた知覚品質でビデオコンテンツを補間できることを示しています。ソースコードはこちらから入手できます。

Existing works on video frame interpolation (VFI) mostly employ deep neural networks trained to minimize the L1 or L2 distance between their outputs and ground-truth frames. Despite recent advances, existing VFI methods tend to produce perceptually inferior results, particularly for challenging scenarios including large motions and dynamic textures. Towards developing perceptually-oriented VFI methods, we propose latent diffusion model-based VFI, LDMVFI. This approaches the VFI problem from a generative perspective by formulating it as a conditional generation problem. As the first effort to address VFI using latent diffusion models, we rigorously benchmark our method following the common evaluation protocol adopted in the existing VFI literature. Our quantitative experiments and user study indicate that LDMVFI is able to interpolate video content with superior perceptual quality compared to the state of the art, even in the high-resolution regime. Our source code will be made available here.

updated: Mon Jul 17 2023 15:51:03 GMT+0000 (UTC)

published: Thu Mar 16 2023 17:24:41 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト