Multiscale Augmented Normalizing Flows for Image Compression

Marc Windsheimer; Fabian Brand; André Kaup

画像圧縮のためのマルチスケール拡張正規化フロー

ほとんどの学習ベースの画像圧縮方法は、非可逆設計であるため、高画質の効率が不十分です。頻繁に適用される圧縮オートエンコーダアーキテクチャのデコード機能は、エンコード変換の近似逆関数にすぎません。この問題は、量子化が実行されない場合に完全な再構成を可能にする可逆潜在変数モデルを使用することで解決できます。さらに、多くの従来の画像およびビデオコーダは、動的ブロック分割を適用して、コンテンツに応じて特定の画像領域の圧縮を変更します。このアプローチに触発されて、階層型潜在空間が学習ベースの圧縮ネットワークに適用されています。この論文では、拡張正規化フローに階層的潜在空間を適応させる新しい概念、可逆的潜在変数モデルを提案します。当社の最高パフォーマンスのモデルは、同等の単一スケールモデルと比較して平均 7% 以上のレート削減を達成しました。

Most learning-based image compression methods lack efficiency for high image quality due to their non-invertible design. The decoding function of the frequently applied compressive autoencoder architecture is only an approximated inverse of the encoding transform. This issue can be resolved by using invertible latent variable models, which allow a perfect reconstruction if no quantization is performed. Furthermore, many traditional image and video coders apply dynamic block partitioning to vary the compression of certain image regions depending on their content. Inspired by this approach, hierarchical latent spaces have been applied to learning-based compression networks. In this paper, we present a novel concept, which adapts the hierarchical latent space for augmented normalizing flows, an invertible latent variable model. Our best performing model achieved average rate savings of more than 7% over comparable single-scale models.

updated: Wed May 22 2024 14:24:55 GMT+0000 (UTC)

published: Tue May 09 2023 13:42:43 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト