Causal Contextual Prediction for Learned Image Compression

Zongyu Guo; Zhizheng Zhang; Runsen Feng; Zhibo Chen

学習した画像圧縮の因果的文脈予測

過去数年にわたって、私たちは学習した画像圧縮の分野で目覚ましい進歩を目の当たりにしてきました。最近学習された画像コーデックは、一般にオートエンコーダに基づいており、最初に画像を低次元の潜在表現にエンコードし、次に再構成の目的でそれらをデコードします。潜在空間の空間依存性をキャプチャするために、以前の作業では、ハイパープライアおよび空間コンテキストモデルを利用して、エンドツーエンドのレート歪み最適化のビットレートを推定するエントロピーモデルを構築します。ただし、このようなエントロピーモデルは、次の2つの側面から最適ではありません。（1）潜在的なものの間の空間的にグローバルな相関関係をキャプチャできません。（2）潜在性のクロスチャネル関係はまだ調査されていません。この論文では、潜在空間における因果的文脈エントロピー予測のためにシリアルデコードプロセスを活用するための個別エントロピーコーディングの概念を提案します。チャネル間で潜在性を分離し、チャネル間の関係を利用して非常に有益なコンテキストを生成する因果コンテキストモデルが提案されます。さらに、未知の点を正確に予測するためのグローバル参照点を見つけることができる因果的グローバル予測モデルを提案します。これら2つのモデルはどちらも、オーバーヘッドを伝達することなくエントロピー推定を容易にします。さらに、より強力な変換ネットワークを構築するために、新しい個別のアテンションモジュールをさらに採用しています。実験結果は、PSNRとMS-SSIMの両方の点で、コダックデータセットの標準VVC / H.266コーデックよりも完全な画像圧縮モデルが優れており、最先端のレート歪み性能を発揮することを示しています。

Over the past several years, we have witnessed impressive progress in the field of learned image compression. Recent learned image codecs are commonly based on autoencoders, that first encode an image into low-dimensional latent representations and then decode them for reconstruction purposes. To capture spatial dependencies in the latent space, prior works exploit hyperprior and spatial context model to build an entropy model, which estimates the bit-rate for end-to-end rate-distortion optimization. However, such an entropy model is suboptimal from two aspects: (1) It fails to capture spatially global correlations among the latents. (2) Cross-channel relationships of the latents are still underexplored. In this paper, we propose the concept of separate entropy coding to leverage a serial decoding process for causal contextual entropy prediction in the latent space. A causal context model is proposed that separates the latents across channels and makes use of cross-channel relationships to generate highly informative contexts. Furthermore, we propose a causal global prediction model, which is able to find global reference points for accurate predictions of unknown points. Both these two models facilitate entropy estimation without the transmission of overhead. In addition, we further adopt a new separate attention module to build more powerful transform networks. Experimental results demonstrate that our full image compression model outperforms standard VVC/H.266 codec on Kodak dataset in terms of both PSNR and MS-SSIM, yielding the state-of-the-art rate-distortion performance.

updated: Sun Oct 31 2021 05:06:18 GMT+0000 (UTC)

published: Thu Nov 19 2020 08:15:10 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト