FLAVR: Flow-Agnostic Video Representations for Fast Frame Interpolation

Tarun Kalluri; Deepak Pathak; Manmohan Chandraker; Du Tran

FLAVR：高速フレーム補間のためのフローにとらわれないビデオ表現

ビデオフレーム補間の大部分の方法は、ビデオの隣接するフレーム間の双方向オプティカルフローを計算し、続いて適切なワーピングアルゴリズムを使用して出力フレームを生成します。ただし、オプティカルフローに依存するアプローチでは、オクルージョンや複雑な非線形モーションをビデオから直接モデル化できず、広範な展開に適さない追加のボトルネックが発生することがよくあります。これらの制限は、3D時空間コンボリューションを使用してビデオフレーム補間のエンドツーエンドの学習と推論を可能にする柔軟で効率的なアーキテクチャであるFLAVRで対処します。私たちの方法は、非線形運動、複雑なオクルージョン、時間的抽象化について効率的に推論することを学習し、オプティカルフローや深度マップの形式で追加の入力を必要とせずに、ビデオ補間のパフォーマンスを向上させます。 FLAVRはその単純さにより、補間精度を失うことなく、マルチフレーム補間で現在最も正確な方法と比較して3倍速い推論速度を提供できます。さらに、さまざまな困難な設定でFLAVRを評価し、Vimeo-90K、UCF101、DAVIS、Adobe、GoProなどのさまざまな一般的なベンチマークで以前の方法と比較して優れた定性的および定量的結果を一貫して示しています。最後に、ビデオフレーム補間用のFLAVRが、アクション認識、オプティカルフロー推定、およびモーション拡大のための有用な自己監視型口実タスクとして機能できることを示します。

A majority of methods for video frame interpolation compute bidirectional optical flow between adjacent frames of a video, followed by a suitable warping algorithm to generate the output frames. However, approaches relying on optical flow often fail to model occlusions and complex non-linear motions directly from the video and introduce additional bottlenecks unsuitable for widespread deployment. We address these limitations with FLAVR, a flexible and efficient architecture that uses 3D space-time convolutions to enable end-to-end learning and inference for video frame interpolation. Our method efficiently learns to reason about non-linear motions, complex occlusions and temporal abstractions, resulting in improved performance on video interpolation, while requiring no additional inputs in the form of optical flow or depth maps. Due to its simplicity, FLAVR can deliver 3x faster inference speed compared to the current most accurate method on multi-frame interpolation without losing interpolation accuracy. In addition, we evaluate FLAVR on a wide range of challenging settings and consistently demonstrate superior qualitative and quantitative results compared with prior methods on various popular benchmarks including Vimeo-90K, UCF101, DAVIS, Adobe, and GoPro. Finally, we demonstrate that FLAVR for video frame interpolation can serve as a useful self-supervised pretext task for action recognition, optical flow estimation, and motion magnification.

updated: Thu Apr 15 2021 18:53:49 GMT+0000 (UTC)

published: Tue Dec 15 2020 18:59:30 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト