Sketch Me A Video

Haichao Zhang; Gang Yu; Tao Chen; Guozhong Luo

ビデオをスケッチしてください

Sketch Me A Video

ビデオの作成は、アーティストが探求する魅力的でありながら挑戦的なタスクです。ディープラーニングの進歩に伴い、最近の研究では、ディープ畳み込みニューラルネットワークを利用してガイドビデオを使用してビデオを合成しようとし、有望な結果を達成しています。しかしながら、ガイドビデオ、または他の形式のガイド時間情報の取得は、費用がかかり、実際には困難である。したがって、この作業では、リアルなポートレートビデオを作成するための入力としてのみ、2つのラフなバッドドルワンスケッチを使用することにより、新しいビデオ合成タスクを紹介します。 2段階のSketch-to-Videoモデルが提案されています。これは、2つの主要な新規性で構成されています。1）入力スケッチをさまざまな部分に分割し、これらの部分を利用して現実的な開始またはエンドフレームと豊富なセマンティック機能を生成する一方で、さまざまなユーザーが任意に描画した自由形式のスケッチスタイルによるスケッチのドメイン外の問題を軽減するように設計されています。 2）動画（トレーニングフェーズでのみ使用）を正規分布でモデル化されたモーション空間に投影し、モーション変数を上記で抽出したセマンティックフィーチャとブレンドする、モーションプロジェクションとそれに続くフィーチャブレンディングモジュールを提案して、欠落しているガイド時間情報を軽減します。テストフェーズでの問題。 CelebAMask-HQとVoxCeleb2データセットの組み合わせで実施された実験は、2つのラフなスケッチから高品質のビデオを合成する際に、私たちの方法が優れた定量的および定性的結果の両方を達成できることを十分に検証しています。

Video creation has been an attractive yet challenging task for artists to explore. With the advancement of deep learning, recent works try to utilize deep convolutional neural networks to synthesize a video with the aid of a guiding video, and have achieved promising results. However, the acquisition of guiding videos, or other forms of guiding temporal information is costly expensive and difficult in reality. Therefore, in this work we introduce a new video synthesis task by employing two rough bad-drwan sketches only as input to create a realistic portrait video. A two-stage Sketch-to-Video model is proposed, which consists of two key novelties: 1) a feature retrieve and projection (FRP) module, which parititions the input sketch into different parts and utilizes these parts for synthesizing a realistic start or end frame and meanwhile generating rich semantic features, is designed to alleviate the sketch out-of-domain problem due to arbitrarily drawn free-form sketch styles by different users. 2) A motion projection followed by feature blending module, which projects a video (used only in training phase) into a motion space modeled by normal distribution and blends the motion variables with semantic features extracted above, is proposed to alleviate the guiding temporal information missing problem in the test phase. Experiments conducted on a combination of CelebAMask-HQ and VoxCeleb2 dataset well validate that, our method can acheive both good quantitative and qualitative results in synthesizing high-quality videos from two rough bad-drawn sketches.

updated: Sun Oct 10 2021 05:40:11 GMT+0000 (UTC)

published: Sun Oct 10 2021 05:40:11 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト