Cross-Attention Transformer for Video Interpolation

Hannah Halin Kim; Shuzhi Yu; Shuai Yuan; Carlo Tomasi

ビデオ補間用 Cross-Attention Transformer

ビデオ補間用の残差ニューラルネットワークである TAIN (Transformers and Attention for video INterpolation) を提案します。これは、周囲に 2 つの連続する画像フレームが与えられた中間フレームを補間することを目的としています。最初に、予測された補間フレームと同様の外観を持つ入力画像の特徴をグローバルに集約する、Cross Similarity (CS) という名前の新しいビジョントランスフォーマーモジュールを提示します。これらの CS 機能は、補間された予測を改善するために使用されます。 CS機能のオクルージョンを説明するために、ネットワークが1つのフレームから他のフレームよりもCS機能に集中できるようにするImage Attention（IA）モジュールを提案します。 TAIN は、フロー推定を必要としない既存の方法よりも優れており、Vimeo90k、UCF101、および SNU-FILM ベンチマークでの推論時間に関して計算効率が高く、フローベースの方法と同等のパフォーマンスを発揮します。

We propose TAIN (Transformers and Attention for video INterpolation), a residual neural network for video interpolation, which aims to interpolate an intermediate frame given two consecutive image frames around it. We first present a novel vision transformer module, named Cross Similarity (CS), to globally aggregate input image features with similar appearance as those of the predicted interpolated frame. These CS features are then used to refine the interpolated prediction. To account for occlusions in the CS features, we propose an Image Attention (IA) module to allow the network to focus on CS features from one frame over those of the other. TAIN outperforms existing methods that do not require flow estimation and performs comparably to flow-based methods while being computationally efficient in terms of inference time on Vimeo90k, UCF101, and SNU-FILM benchmarks.

updated: Fri Dec 02 2022 02:48:37 GMT+0000 (UTC)

published: Fri Jul 08 2022 21:38:54 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト