Video Content Swapping Using GAN

Tingfung Lau; Sailun Xu; Xinze Wang

GANを使用したビデオコンテンツの交換

ビデオ生成は、コンピュータビジョンの興味深い問題です。データ拡張、移動時の特殊効果、AR / VRなどで非常に人気があります。深層学習の進歩に伴い、この課題を解決するために多くの深層生成モデルが提案されてきました。これらの深い生成モデルは、教師なしの方法で深い特徴表現を学習できるため、ラベルのないすべての画像とビデオをオンラインで利用できるようにします。これらのモデルは、さまざまな種類の画像を生成することもでき、視覚的なアプリケーションに大きな価値があります。ただし、ビデオ内のオブジェクトの外観だけでなく、それらの時間的な動きもモデル化する必要があるため、ビデオの生成ははるかに困難になります。この作品では、ビデオのフレームをコンテンツとポーズに分解します。まず、事前にトレーニングされた人間のポーズ検出を使用してビデオからポーズ情報を抽出し、生成モデルを使用して、コンテンツコードとポーズコードに基づいてビデオを合成します。

Video generation is an interesting problem in computer vision. It is quite popular for data augmentation, special effect in move, AR/VR and so on. With the advances of deep learning, many deep generative models have been proposed to solve this task. These deep generative models provide away to utilize all the unlabeled images and videos online, since it can learn deep feature representations with unsupervised manner. These models can also generate different kinds of images, which have great value for visual application. However generating a video would be much more challenging since we need to model not only the appearances of objects in the video but also their temporal motion. In this work, we will break down any frame in the video into content and pose. We first extract the pose information from a video using a pre-trained human pose detection and use a generative model to synthesize the video based on the content code and pose code.

updated: Sun Nov 21 2021 23:01:58 GMT+0000 (UTC)

published: Sun Nov 21 2021 23:01:58 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト