Video Coding Using Learned Latent GAN Compression

Mustafa Shukor; Bharath Bushan Damodaran; Xu Yao; Pierre Hellier

学習した潜在GAN圧縮を使用したビデオコーディング

この論文では、顔のビデオ圧縮の新しいパラダイムを提案します。 StyleGANなどのGANの生成能力を活用して、イントラ圧縮やインター圧縮を含むビデオを表現および圧縮します。各フレームはStyleGANの潜在空間で反転され、そこから最適な圧縮が学習されます。そのために、微分同相潜在表現は、正規化フローモデルを使用して学習されます。このモデルでは、エントロピーモデルを画像コーディング用に最適化できます。さらに、他の対応物よりも効率的な新しい知覚損失を提案します。最後に、残差を伴うビデオインターコーディングのエントロピーモデルも、以前に構築された潜在表現で学習されます。私たちの方法（SGANC）は、VTM、AV1、最近の深層学習技術などの最先端のコーデックと比較して、シンプルでトレーニングが速く、画像とビデオのコーディングでより良い結果を達成します。特に、低ビットレートでの知覚歪みを大幅に最小限に抑えます。

We propose in this paper a new paradigm for facial video compression. We leverage the generative capacity of GANs such as StyleGAN to represent and compress a video, including intra and inter compression. Each frame is inverted in the latent space of StyleGAN, from which the optimal compression is learned. To do so, a diffeomorphic latent representation is learned using a normalizing flows model, where an entropy model can be optimized for image coding. In addition, we propose a new perceptual loss that is more efficient than other counterparts. Finally, an entropy model for video inter coding with residual is also learned in the previously constructed latent representation. Our method (SGANC) is simple, faster to train, and achieves better results for image and video coding compared to state-of-the-art codecs such as VTM, AV1, and recent deep learning techniques. In particular, it drastically minimizes perceptual distortion at low bit rates.

updated: Sat Jul 09 2022 19:07:43 GMT+0000 (UTC)

published: Sat Jul 09 2022 19:07:43 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト