Shortcut-V2V: Compression Framework for Video-to-Video Translation based on Temporal Redundancy Reduction

Chaeyeon Chung; Yeojeong Park; Seunghwan Choi; Munkhsoyol Ganbat; Jaegul Choo

ショートカット V2V: 時間的冗長性の削減に基づくビデオからビデオへの変換のための圧縮フレームワーク

ビデオ間の変換は、入力ビデオからターゲットドメインのビデオフレームを生成することを目的としています。その有用性にもかかわらず、既存のネットワークは膨大な計算を必要とするため、広く使用するにはモデルの圧縮が必要です。さまざまな画像/ビデオタスクの計算効率を向上させる圧縮方法は存在しますが、ビデオからビデオへの変換に一般的に適用できる圧縮方法はあまり研究されていません。これに応えて、ビデオからビデオへの変換のための汎用圧縮フレームワークである Shortcut-V2V を紹介します。 Shourcut-V2V は、前のフレームの中間特徴から現在のフレームの中間特徴を近似することにより、隣接するすべてのビデオフレームの完全な推論を回避します。さらに、私たちのフレームワークでは、AdaBDと呼ばれる新しく提案されたブロックが隣接するフレームの特徴を適応的にブレンドおよび変形するため、中間特徴のより正確な予測が可能になります。私たちは、フレームワークの一般的な適用性を実証するために、さまざまなタスクに対してよく知られたビデオからビデオへの変換モデルを使用して定量的および定性的な評価を実施します。その結果、Sourcut-V2V は、テスト時に 3.2 ～ 5.7 倍の計算コストと 7.8 ～ 44 倍のメモリを節約しながら、元のビデオ間変換モデルと比較して同等のパフォーマンスを達成できることがわかりました。

Video-to-video translation aims to generate video frames of a target domain from an input video. Despite its usefulness, the existing networks require enormous computations, necessitating their model compression for wide use. While there exist compression methods that improve computational efficiency in various image/video tasks, a generally-applicable compression method for video-to-video translation has not been studied much. In response, we present Shortcut-V2V, a general-purpose compression framework for video-to-video translation. Shourcut-V2V avoids full inference for every neighboring video frame by approximating the intermediate features of a current frame from those of the previous frame. Moreover, in our framework, a newly-proposed block called AdaBD adaptively blends and deforms features of neighboring frames, which makes more accurate predictions of the intermediate features possible. We conduct quantitative and qualitative evaluations using well-known video-to-video translation models on various tasks to demonstrate the general applicability of our framework. The results show that Shourcut-V2V achieves comparable performance compared to the original video-to-video translation model while saving 3.2-5.7x computational cost and 7.8-44x memory at test time.

updated: Tue Aug 15 2023 19:50:38 GMT+0000 (UTC)

published: Tue Aug 15 2023 19:50:38 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト