SVCNet: Scribble-based Video Colorization Network with Temporal Aggregation

Yuzhi Zhao; Lai-Man Po; Kangcheng Liu; Xuehui Wang; Wing-Yin Yu; Pengfei Xian; Yujia Zhang; Mengyang Liu

SVCNet: 時間集約による落書きベースのビデオカラー化ネットワーク

この論文では、SVCNet と呼ばれる一時的な集約を使用した落書きベースのビデオカラー化ネットワークを提案します。ユーザーが指定したさまざまな色の落書きに基づいて、モノクロのビデオに色を付けることができます。これは、落書きベースのビデオカラー化領域における 3 つの一般的な問題 (カラー化の鮮やかさ、時間的な一貫性、色のにじみ) に対処します。色付けの品質を向上させ、時間的な一貫性を強化するために、SVCNet で 2 つの連続したサブネットワークを採用して、それぞれ正確な色付けと時間的な平滑化を行います。第 1 段階には、カラースクリブルをグレースケールフレームに組み込むためのピラミッドフィーチャエンコーダと、セマンティクスを抽出するためのセマンティックフィーチャエンコーダが含まれます。第 2 段階では、隣接する色付きフレーム (短距離接続として) と最初の色付きフレーム (長距離接続として) の情報を集約することにより、第 1 段階からの出力を微調整します。カラーブリードアーティファクトを軽減するために、ビデオのカラー化とセグメンテーションを同時に学習します。さらに、操作の大部分を固定の小さな画像解像度に設定し、SVCNet の末尾にある超解像度モジュールを使用して元のサイズを復元します。これにより、SVCNet は推論時にさまざまな画像解像度に適合できます。最後に、提案された SVCNet を DAVIS および Videvo ベンチマークで評価します。実験結果は、SVCNet が他のよく知られているビデオカラー化アプローチよりも高品質で時間的に一貫性のあるビデオを生成することを示しています。コードとモデルは、https://github.com/zhaoyuzhi/SVCNet にあります。

In this paper, we propose a scribble-based video colorization network with temporal aggregation called SVCNet. It can colorize monochrome videos based on different user-given color scribbles. It addresses three common issues in the scribble-based video colorization area: colorization vividness, temporal consistency, and color bleeding. To improve the colorization quality and strengthen the temporal consistency, we adopt two sequential sub-networks in SVCNet for precise colorization and temporal smoothing, respectively. The first stage includes a pyramid feature encoder to incorporate color scribbles with a grayscale frame, and a semantic feature encoder to extract semantics. The second stage finetunes the output from the first stage by aggregating the information of neighboring colorized frames (as short-range connections) and the first colorized frame (as a long-range connection). To alleviate the color bleeding artifacts, we learn video colorization and segmentation simultaneously. Furthermore, we set the majority of operations on a fixed small image resolution and use a Super-resolution Module at the tail of SVCNet to recover original sizes. It allows the SVCNet to fit different image resolutions at the inference. Finally, we evaluate the proposed SVCNet on DAVIS and Videvo benchmarks. The experimental results demonstrate that SVCNet produces both higher-quality and more temporally consistent videos than other well-known video colorization approaches. The codes and models can be found at https://github.com/zhaoyuzhi/SVCNet.

updated: Fri Aug 04 2023 14:15:39 GMT+0000 (UTC)

published: Tue Mar 21 2023 04:42:39 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト