Learned Image Compression with Discretized Gaussian-Laplacian-Logistic Mixture Model and Concatenated Residual Modules

Haisheng Fu; Feng Liang; Jianping Lin; Bing Li; Mohammad Akbari; Jie Liang; Guohe Zhang; Dong Liu; Chengjie Tu; Jingning Han

離散化されたガウス-ラプラシアン-ロジスティック混合モデルと連結された残差モジュールを使用した学習済み画像圧縮

最近、ディープラーニングベースの画像圧縮方法は重要な成果を達成し、PSNRとMS-SSIMの両方のメトリックで最新の標準的な多用途ビデオコーディング（VVC）を含む従来のアプローチを徐々に上回りました。学習した画像圧縮フレームワークの2つの重要なコンポーネントは、潜在表現のエントロピーモデルとエンコード/デコードネットワークアーキテクチャです。自己回帰、ソフトマックス、ロジスティック混合、ガウス混合、ラプラシアンなど、さまざまなモデルが提案されています。既存のスキームは、これらのモデルの1つのみを使用します。ただし、画像は非常に多様であるため、1つの画像の異なる領域であっても、すべての画像に1つのモデルを使用することは最適ではありません。この論文では、潜在表現のためのより柔軟な離散化ガウス-ラプラシアン-ロジスティック混合モデル（GLLMM）を提案します。これは、異なる画像の異なるコンテンツおよび1つの画像の異なる領域により正確に適応できます。さらに、エンコード/デコードネットワークの設計部分では、複数の残余ブロックが追加のショートカット接続でシリアルに接続されている連結残余ブロック（CRB）を提案します。 CRBは、ネットワークの学習能力を向上させることができ、圧縮パフォーマンスをさらに向上させることができます。 KodakおよびTecnickデータセットを使用した実験結果は、提案されたスキームが、VVCイントラコーディング（4：4：4および4：2：0）を含むすべての最先端の学習ベースの方法および既存の圧縮標準よりも優れていることを示しています。 PSNRとMS-SSIMの

Recently deep learning-based image compression methods have achieved significant achievements and gradually outperformed traditional approaches including the latest standard Versatile Video Coding (VVC) in both PSNR and MS-SSIM metrics. Two key components of learned image compression frameworks are the entropy model of the latent representations and the encoding/decoding network architectures. Various models have been proposed, such as autoregressive, softmax, logistic mixture, Gaussian mixture, and Laplacian. Existing schemes only use one of these models. However, due to the vast diversity of images, it is not optimal to use one model for all images, even different regions of one image. In this paper, we propose a more flexible discretized Gaussian-Laplacian-Logistic mixture model (GLLMM) for the latent representations, which can adapt to different contents in different images and different regions of one image more accurately. Besides, in the encoding/decoding network design part, we propose a concatenated residual blocks (CRB), where multiple residual blocks are serially connected with additional shortcut connections. The CRB can improve the learning ability of the network, which can further improve the compression performance. Experimental results using the Kodak and Tecnick datasets show that the proposed scheme outperforms all the state-of-the-art learning-based methods and existing compression standards including VVC intra coding (4:4:4 and 4:2:0) in terms of the PSNR and MS-SSIM.

updated: Wed Jul 14 2021 02:54:22 GMT+0000 (UTC)

published: Wed Jul 14 2021 02:54:22 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト