Extreme Generative Image Compression by Learning Text Embedding from Diffusion Models

Zhihong Pan; Xin Zhou; Hao Tian

拡散モデルからのテキスト埋め込みの学習による極端な生成的画像圧縮

限られた帯域幅で大量の高解像度画像を転送することは重要ですが、非常に困難な作業です。非常に低いビットレート (<0.1 bpp) を使用した画像の圧縮が研究されていますが、圧縮データに使用できるビット数に強い制約があるため、アーティファクトの多い低品質の画像になることがよくあります。百聞は一見にしかずとよく言われますが、一方で言葉は、短い説明を使ってイメージの本質をとらえるのに非常に強力です。テキストから画像への生成のための拡散モデルの最近の成功により、画像を短いテキスト埋め込みとして保存する可能性を実証する生成画像圧縮方法を提案します。これは、同等の忠実度の高い画像を生成するために使用できます。感覚的にオリジナルに。与えられた画像に対して、元のトランスフォーマーをバイパスした後に学習可能なテキスト埋め込みを入力として使用して、テキストから画像への拡散モデル自体と同じ最適化プロセスを使用して、対応するテキスト埋め込みが学習されます。最適化は学習圧縮モデルと一緒に適用され、0.1 bpp 未満の低ビットレートの極端な圧縮を実現します。包括的な一連の画質メトリックによって測定された実験に基づいて、私たちの方法は、知覚品質と多様性の両方の点で、他の最先端の深層学習方法よりも優れています。

Transferring large amount of high resolution images over limited bandwidth is an important but very challenging task. Compressing images using extremely low bitrates (<0.1 bpp) has been studied but it often results in low quality images of heavy artifacts due to the strong constraint in the number of bits available for the compressed data. It is often said that a picture is worth a thousand words but on the other hand, language is very powerful in capturing the essence of an image using short descriptions. With the recent success of diffusion models for text-to-image generation, we propose a generative image compression method that demonstrates the potential of saving an image as a short text embedding which in turn can be used to generate high-fidelity images which is equivalent to the original one perceptually. For a given image, its corresponding text embedding is learned using the same optimization process as the text-to-image diffusion model itself, using a learnable text embedding as input after bypassing the original transformer. The optimization is applied together with a learning compression model to achieve extreme compression of low bitrates <0.1 bpp. Based on our experiments measured by a comprehensive set of image quality metrics, our method outperforms the other state-of-the-art deep learning methods in terms of both perceptual quality and diversity.

updated: Mon Nov 14 2022 22:54:19 GMT+0000 (UTC)

published: Mon Nov 14 2022 22:54:19 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト