Extreme Image Compression using Fine-tuned VQGAN Models

Qi Mao; Tinghan Yang; Yinuo Zhang; Shuyin Pan; Meng Wang; Shiqi Wang; Siwei Ma

微調整された VQGAN モデルを使用した極端な画像圧縮

生成圧縮方式の最近の進歩により、特にビットレートが低いシナリオにおいて、圧縮データの知覚品質の向上において目覚ましい進歩が見られます。それにもかかわらず、極端な圧縮率 (<0.1 bpp) を達成する際の有効性と適用性には依然として制約が残っています。この研究では、ベクトル量子化 (VQ) ベースの生成モデルを画像圧縮ドメインに導入することにより、シンプルかつ効果的なコーディングフレームワークを提案します。主な洞察は、VQGAN モデルによって学習されたコードブックが強力な表現能力をもたらし、再構成の品質を維持しながら潜在空間内の連続情報の効率的な圧縮を促進するということです。具体的には、画像は、最も近いコードワードを見つけることによって VQ インデックスとして表現でき、可逆圧縮方式を使用してビットストリームにエンコードできます。次に、K 平均法アルゴリズムを使用して、事前トレーニングされた大規模なコードブックをより小さなコードブックにクラスタリングすることを提案します。これにより、画像をさまざまな範囲の VQ インデックスマップとして表現できるようになり、その結果、可変ビットレートとさまざまなレベルの再構成品質が得られます。さまざまなデータセットに対する広範な定性的および定量的な実験により、提案されたフレームワークが、知覚品質指向のメトリクスと、極度に低いビットレート下での人間の知覚の点で最先端のコーデックよりも優れていることが実証されました。

Recent advances in generative compression methods have demonstrated remarkable progress in enhancing the perceptual quality of compressed data, especially in scenarios with low bitrates. Nevertheless, their efficacy and applicability in achieving extreme compression ratios (<0.1 bpp) still remain constrained. In this work, we propose a simple yet effective coding framework by introducing vector quantization (VQ)-based generative models into the image compression domain. The main insight is that the codebook learned by the VQGAN model yields strong expressive capacity, facilitating efficient compression of continuous information in the latent space while maintaining reconstruction quality. Specifically, an image can be represented as VQ-indices by finding the nearest codeword, which can be encoded using lossless compression methods into bitstreams. We then propose clustering a pre-trained large-scale codebook into smaller codebooks using the K-means algorithm. This enables images to be represented as diverse ranges of VQ-indices maps, resulting in variable bitrates and different levels of reconstruction quality. Extensive qualitative and quantitative experiments on various datasets demonstrate that the proposed framework outperforms the state-of-the-art codecs in terms of perceptual quality-oriented metrics and human perception under extremely low bitrates.

updated: Mon Jul 17 2023 06:14:19 GMT+0000 (UTC)

published: Mon Jul 17 2023 06:14:19 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト