Self-supervised Learning of Event-guided Video Frame Interpolation for Rolling Shutter Frames

Yunfan Lu; Guoqiang Liang; Lin Wang

ローリングシャッターフレームのイベントガイド付きビデオフレーム補間の自己教師あり学習

この論文では、新しいイベントカメラデータに基づいて、2 つの連続するローリングシャッター (RS) フレームから任意のフレームレートの潜在グローバルシャッター (GS) フレームを復元するという困難なタスクに取り組む最初の試みを行います。イベントは高い時間分解能を備えており、ビデオフレーム補間 (VFI) に有益ですが、このタスクに取り組む際の障害は、ペアの GS フレームが不足していることです。もう 1 つの課題は、移動するオブジェクトをキャプチャするときに RS フレームが歪みの影響を受けやすいことです。この目的を達成するために、イベントを活用して RS フレーム補正と VFI を統一フレームワークでガイドする、新しい自己監視型フレームワークを提案します。私たちの重要なアイデアは、露光時間中のすべてのピクセルの変位フィールド (DF) 非線形の密な 3D 時空間情報を推定し、RS フレームと GS フレームおよび任意のフレームレート VFI 間の相互再構成を可能にすることです。具体的には、変位場推定 (DFE) モジュールは、イベントから時空間運動を推定し、RS 歪みを補正し、GS フレームを 1 ステップで補間するために提案されています。次に、入力 RS フレームと DF を組み合わせて、RS から GS へのフレーム補間のマッピングを学習します。ただし、マッピングの制約が非常に低いため、自己監視のために逆マッピング (つまり、GS から RS) および RS フレームワーピング (つまり、RS から RS) と組み合わせます。評価用のラベル付きデータセットが不足しているため、2 つの合成データセットを生成し、メソッドをトレーニングしてテストするために現実世界のデータセットを収集します。実験結果は、私たちの方法が以前の教師あり方法と同等以上のパフォーマンスをもたらすことを示しています。

This paper makes the first attempt to tackle the challenging task of recovering arbitrary frame rate latent global shutter (GS) frames from two consecutive rolling shutter (RS) frames, guided by the novel event camera data. Although events possess high temporal resolution, beneficial for video frame interpolation (VFI), a hurdle in tackling this task is the lack of paired GS frames. Another challenge is that RS frames are susceptible to distortion when capturing moving objects. To this end, we propose a novel self-supervised framework that leverages events to guide RS frame correction and VFI in a unified framework. Our key idea is to estimate the displacement field (DF) non-linear dense 3D spatiotemporal information of all pixels during the exposure time, allowing for the reciprocal reconstruction between RS and GS frames as well as arbitrary frame rate VFI. Specifically, the displacement field estimation (DFE) module is proposed to estimate the spatiotemporal motion from events to correct the RS distortion and interpolate the GS frames in one step. We then combine the input RS frames and DF to learn a mapping for RS-to-GS frame interpolation. However, as the mapping is highly under-constrained, we couple it with an inverse mapping (i.e., GS-to-RS) and RS frame warping (i.e., RS-to-RS) for self-supervision. As there is a lack of labeled datasets for evaluation, we generate two synthetic datasets and collect a real-world dataset to train and test our method. Experimental results show that our method yields comparable or better performance with prior supervised methods.

updated: Tue Jun 27 2023 14:30:25 GMT+0000 (UTC)

published: Tue Jun 27 2023 14:30:25 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト