Learning Temporally and Semantically Consistent Unpaired Video-to-video Translation Through Pseudo-Supervision From Synthetic Optical Flow

Kaihong Wang; Kumar Akash; Teruhisa Misu

合成オプティカルフローからの疑似監視による時間的および意味的に一貫した対になっていないビデオからビデオへの変換の学習

ペアになっていないビデオからビデオへの変換は、ペアになっているトレーニングデータを必要とせずに、ソースドメインとターゲットドメインの間でビデオを変換することを目的としており、実際のアプリケーションでより実行可能になります。残念ながら、翻訳されたビデオは一般的に時間的および意味的な矛盾に悩まされています。これに対処するために、多くの既存の作品は、モーションエスティメーションに基づく時間情報を組み込んだ時空間一貫性制約を採用しています。ただし、動きの推定の不正確さは、時空間の一貫性に向けたガイダンスの品質を低下させ、不安定な翻訳につながります。この作業では、入力ビデオの動きを推定するのではなく、生成されたオプティカルフローと合成することにより、時空間の一貫性を正規化する新しいパラダイムを提案します。したがって、合成モーションを正則化パラダイムに適用して、モーション推定でエラーが発生するリスクなしに、ドメイン間でモーションの一貫性を保つことができます。その後、教師なしリサイクルと教師なし空間損失を利用して、合成オプティカルフローによって提供される疑似監視によって導かれ、両方のドメインで時空間一貫性を正確に適用します。実験は、私たちの方法がさまざまなシナリオで用途が広く、時間的および意味的に一貫したビデオを生成する際に最先端のパフォーマンスを達成することを示しています。コードはhttps://github.com/wangkaihong/Unsup_Recycle_GAN/で入手できます。

Unpaired video-to-video translation aims to translate videos between a source and a target domain without the need of paired training data, making it more feasible for real applications. Unfortunately, the translated videos generally suffer from temporal and semantic inconsistency. To address this, many existing works adopt spatiotemporal consistency constraints incorporating temporal information based on motion estimation. However, the inaccuracies in the estimation of motion deteriorate the quality of the guidance towards spatiotemporal consistency, which leads to unstable translation. In this work, we propose a novel paradigm that regularizes the spatiotemporal consistency by synthesizing motions in input videos with the generated optical flow instead of estimating them. Therefore, the synthetic motion can be applied in the regularization paradigm to keep motions consistent across domains without the risk of errors in motion estimation. Thereafter, we utilize our unsupervised recycle and unsupervised spatial loss, guided by the pseudo-supervision provided by the synthetic optical flow, to accurately enforce spatiotemporal consistency in both domains. Experiments show that our method is versatile in various scenarios and achieves state-of-the-art performance in generating temporally and semantically consistent videos. Code is available at: https://github.com/wangkaihong/Unsup_Recycle_GAN/.

updated: Fri Aug 05 2022 20:22:23 GMT+0000 (UTC)

published: Sat Jan 15 2022 01:10:34 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト