Video Frame Interpolation with Flow Transformer

Pan Gao; Haoyue Tian; Jie Qin

Flow Transformer によるビデオフレーム補間

ビデオフレーム補間は、畳み込みニューラルネットワークの開発とともに積極的に研究されています。ただし、畳み込みにおけるカーネルの重み共有の本質的な制限により、それによって生成された補間フレームでは詳細が失われる可能性があります。対照的に、Transformer のアテンションメカニズムは各ピクセルの寄与をより適切に区別でき、長距離のピクセル依存関係もキャプチャできるため、ビデオ補間に大きな可能性がもたらされます。それにもかかわらず、オリジナルの Transformer は 2D 画像によく使用されます。ビデオフレーム補間の時間的セルフアテンションを考慮した Transformer ベースのフレームワークを開発する方法は、未解決の問題のままです。この論文では、オプティカルフローからのモーションダイナミクスをセルフアテンションメカニズムに組み込むビデオフレーム補間フロートランスフォーマーを提案します。具体的には、フローのガイダンスに従って、一致したローカルエリアでの時間的セルフアテンションを計算するフロートランスフォーマーブロックを設計します。これにより、フレームワークが、適度に低い複雑さを維持しながら、大きな動きのあるフレームを補間するのに適したものになります。さらに、マルチスケールの動きを考慮したマルチスケールアーキテクチャを構築し、全体のパフォーマンスをさらに向上させます。 3 つのベンチマークに関する広範な実験により、提案された方法が最先端の方法よりも優れた視覚品質で補間フレームを生成できることが実証されました。

Video frame interpolation has been actively studied with the development of convolutional neural networks. However, due to the intrinsic limitations of kernel weight sharing in convolution, the interpolated frame generated by it may lose details. In contrast, the attention mechanism in Transformer can better distinguish the contribution of each pixel, and it can also capture long-range pixel dependencies, which provides great potential for video interpolation. Nevertheless, the original Transformer is commonly used for 2D images; how to develop a Transformer-based framework with consideration of temporal self-attention for video frame interpolation remains an open issue. In this paper, we propose Video Frame Interpolation Flow Transformer to incorporate motion dynamics from optical flows into the self-attention mechanism. Specifically, we design a Flow Transformer Block that calculates the temporal self-attention in a matched local area with the guidance of flow, making our framework suitable for interpolating frames with large motion while maintaining reasonably low complexity. In addition, we construct a multi-scale architecture to account for multi-scale motion, further improving the overall performance. Extensive experiments on three benchmarks demonstrate that the proposed method can generate interpolated frames with better visual quality than state-of-the-art methods.

updated: Sun Jul 30 2023 06:44:37 GMT+0000 (UTC)

published: Sun Jul 30 2023 06:44:37 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト