You Can Mask More For Extremely Low-Bitrate Image Compression

Anqi Li; Feng Li; Jiaxin Han; Huihui Bai; Runmin Cong; Chunjie Zhang; Meng Wang; Weisi Lin; Yao Zhao

非常に低ビットレートの画像圧縮のためにさらにマスクすることができます

学習済み画像圧縮 (LIC) 手法は、近年大幅な進歩を遂げています。ただし、これらの方法は主に中および高ビットレート (> 0.1 ビット/ピクセル (bpp)) でのレートディストーション (RD) パフォーマンスの最適化に特化しており、極度に低いビットレートに関する研究は限られています。さらに、既存の方法では、画像圧縮に重要な画像構造とテクスチャコンポーネントを明示的に調査することができず、ネットワーク内の情報のないコンポーネントと同等に扱われます。これにより、特に低ビットレートのシナリオでは、知覚品質が大幅に低下する可能性があります。この研究では、多くの下流タスクにおける事前トレーニングされたマスクオートエンコーダ (MAE) の成功に触発され、高度な冗長性削減と識別特徴表現のために構造とテクスチャの観点からマスクサンプリング戦略を再考し、LIC の可能性をさらに解き放つことを提案します。方法。したがって、元の画像の構造とテクスチャ分布に基づいて可視パッチをサンプリングする二重適応マスキングアプローチ (DA マスク) を提案します。 DA-Mask と事前トレーニングされた MAE をマスクイメージモデリング (MIM) で組み合わせて、有益なセマンティックコンテキストとテクスチャ表現を抽象化する初期コンプレッサーとして使用します。このようなパイプラインは、LIC ネットワークとうまく連携して、有望な再構築品質を維持しながら、さらなる二次圧縮を実現できます。その結果、我々は、非常に低ビットレートの画像圧縮のために MIM と LIC をエンドツーエンドで統合する最初のフレームワークである、シンプルかつ効果的なマスク圧縮モデル (MCM) を提案します。広範な実験により、私たちのアプローチは、非常に低いビットレートでも、RD パフォーマンス、ビジュアル品質、およびダウンストリームアプリケーションにおいて、最近の最先端の方法よりも優れていることが実証されました。私たちのコードは https://github.com/lianqi1008/MCM.git で入手できます。

Learned image compression (LIC) methods have experienced significant progress during recent years. However, these methods are primarily dedicated to optimizing the rate-distortion (R-D) performance at medium and high bitrates (> 0.1 bits per pixel (bpp)), while research on extremely low bitrates is limited. Besides, existing methods fail to explicitly explore the image structure and texture components crucial for image compression, treating them equally alongside uninformative components in networks. This can cause severe perceptual quality degradation, especially under low-bitrate scenarios. In this work, inspired by the success of pre-trained masked autoencoders (MAE) in many downstream tasks, we propose to rethink its mask sampling strategy from structure and texture perspectives for high redundancy reduction and discriminative feature representation, further unleashing the potential of LIC methods. Therefore, we present a dual-adaptive masking approach (DA-Mask) that samples visible patches based on the structure and texture distributions of original images. We combine DA-Mask and pre-trained MAE in masked image modeling (MIM) as an initial compressor that abstracts informative semantic context and texture representations. Such a pipeline can well cooperate with LIC networks to achieve further secondary compression while preserving promising reconstruction quality. Consequently, we propose a simple yet effective masked compression model (MCM), the first framework that unifies MIM and LIC end-to-end for extremely low-bitrate image compression. Extensive experiments have demonstrated that our approach outperforms recent state-of-the-art methods in R-D performance, visual quality, and downstream applications, at very low bitrates. Our code is available at https://github.com/lianqi1008/MCM.git.

updated: Tue Jun 27 2023 15:36:22 GMT+0000 (UTC)

published: Tue Jun 27 2023 15:36:22 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト