Deep Video Coding with Dual-Path Generative Adversarial Network

Tiesong Zhao; Weize Feng; Hongji Zeng; Yuzhen Niu; Jiaying Liu

デュアルパス生成的敵対的ネットワークによるディープビデオコーディング

ディープラーニングベースのビデオコーディングは、ビデオシーケンスの時空間的な冗長性を絞り出す大きな可能性で大きな注目を集めています。このホワイトペーパーでは、効率的なコーデック、つまりデュアルパス生成的敵対的ネットワークベースのビデオコーデック（DGVC）を提案します。まず、圧縮されたビデオの詳細を再構築するために、生成的敵対的ネットワーク（DPEG）を使用したデュアルパス拡張を提案します。 DPEGは、オートエンコーダーのαパスと畳み込み長短期記憶（ConvLSTM）で構成されており、大きな受容野とマルチフレーム参照による構造特徴の再構築と、残りの注意ブロックのβパスを容易にします。これにより、ローカルテクスチャ機能の再構築が容易になります。両方のパスは、生成的敵対的プロセスによって融合され、共同トレーニングされます。次に、DPEGネットワークを動き補償モジュールと品質向上モジュールの両方で再利用します。これらのモジュールは、DGVCフレームワークの動き推定およびエントロピーコーディングモジュールとさらに組み合わされます。第三に、レート歪み（RD）パフォーマンスをさらに改善するために、ディープビデオ圧縮とエンハンスメントの共同トレーニングを採用しています。 x265 LDP超高速モードと比較して、当社のDGVCは、同じPSNR / MS-SSIMで平均ピクセルあたりのビット数（bpp）を39.39％/ 54.92％削減します。これは、最先端のディープビデオコーデックよりも優れています。かなりのマージン。

The deep-learning-based video coding has attracted substantial attention for its great potential to squeeze out the spatial-temporal redundancies of video sequences. This paper proposes an efficient codec namely dual-path generative adversarial network-based video codec (DGVC). First, we propose a dual-path enhancement with generative adversarial network (DPEG) to reconstruct the compressed video details. The DPEG consists of an α-path of auto-encoder and convolutional long short-term memory (ConvLSTM), which facilitates the structure feature reconstruction with a large receptive field and multi-frame references, and a β-path of residual attention blocks, which facilitates the reconstruction of local texture features. Both paths are fused and co-trained by a generative-adversarial process. Second, we reuse the DPEG network in both motion compensation and quality enhancement modules, which are further combined with motion estimation and entropy coding modules in our DGVC framework. Third, we employ a joint training of deep video compression and enhancement to further improve the rate-distortion (RD) performance. Compared with x265 LDP very fast mode, our DGVC reduces the average bit-per-pixel (bpp) by 39.39%/54.92% at the same PSNR/MS-SSIM, which outperforms the state-of-the art deep video codecs by a considerable margin.

updated: Mon Nov 29 2021 11:39:28 GMT+0000 (UTC)

published: Mon Nov 29 2021 11:39:28 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト