Combining Internal and External Constraints for Unrolling Shutter in Videos

Eyal Naor; Itai Antebi; Shai Bagon; Michal Irani

ビデオでシャッターを展開するための内部制約と外部制約の組み合わせ

ローリングシャッター（RS）カメラで取得したビデオでは、フレームが空間的に歪んでしまいます。これらの歪みは、カメラやシーンの動きが速い場合に顕著になります。 RSの元に戻す効果は、空間的な問題として扱われることがあります。この場合、正しいグローバルシャッター（GS）フレームを生成するために、オブジェクトを修正/移動する必要があります。ただし、RS効果の原因は本質的に時間的であり、空間的ではありません。この論文では、RS問題に対する時空解を提案します。 xyフレーム間の重大な違いにもかかわらず、RSビデオとそれに対応するGSビデオは、既知のサブフレームの時間シフトまで、まったく同じxtスライスを共有する傾向があることがわかります。さらに、各ビデオ内の強力な時間的エイリアスにもかかわらず、それらは小さな2Dxtパッチの同じ分布を共有します。これにより、RS入力ビデオによって課されるビデオ固有の制約を使用してGS出力ビデオを制約できます。私たちのアルゴリズムは、次の3つの主要コンポーネントで構成されています。（i）GS「提案」を抽出する既成の方法（通常のビデオシーケンスでトレーニングされた）を使用した、連続するRSフレーム間の高密度の時間的アップサンプリング。（ii）専用のMergeNetを使用して、そのようなGS「提案」のアンサンブルを正しくマージすることを学習します。（iii）GS出力ビデオとRS入力ビデオの間にxtパッチの類似性を課すビデオ固有のゼロショット最適化。私たちの方法は、小さな合成RS / GSデータセットでトレーニングされているにもかかわらず、数値と視覚の両方でベンチマークデータセットの最先端の結果を取得します。さらに、トレーニングセットの配布外のモーションタイプ（複雑な非剛体モーションなど）を持つ新しい複雑なRSビデオによく一般化されます。これは、はるかに多くのデータでトレーニングされた競合するメソッドではうまく処理できないビデオです。これらの一般化機能は、外部制約と内部制約の組み合わせに起因すると考えられます。

Videos obtained by rolling-shutter (RS) cameras result in spatially-distorted frames. These distortions become significant under fast camera/scene motions. Undoing effects of RS is sometimes addressed as a spatial problem, where objects need to be rectified/displaced in order to generate their correct global shutter (GS) frame. However, the cause of the RS effect is inherently temporal, not spatial. In this paper we propose a space-time solution to the RS problem. We observe that despite the severe differences between their xy frames, a RS video and its corresponding GS video tend to share the exact same xt slices -- up to a known sub-frame temporal shift. Moreover, they share the same distribution of small 2D xt-patches, despite the strong temporal aliasing within each video. This allows to constrain the GS output video using video-specific constraints imposed by the RS input video. Our algorithm is composed of 3 main components: (i) Dense temporal upsampling between consecutive RS frames using an off-the-shelf method, (which was trained on regular video sequences), from which we extract GS "proposals". (ii) Learning to correctly merge an ensemble of such GS "proposals" using a dedicated MergeNet. (iii) A video-specific zero-shot optimization which imposes the similarity of xt-patches between the GS output video and the RS input video. Our method obtains state-of-the-art results on benchmark datasets, both numerically and visually, despite being trained on a small synthetic RS/GS dataset. Moreover, it generalizes well to new complex RS videos with motion types outside the distribution of the training set (e.g., complex non-rigid motions) -- videos which competing methods trained on much more data cannot handle well. We attribute these generalization capabilities to the combination of external and internal constraints.

updated: Sun Jul 24 2022 12:01:27 GMT+0000 (UTC)

published: Sun Jul 24 2022 12:01:27 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト