Beyond Learned Metadata-based Raw Image Reconstruction

Yufei Wang; Yi Yu; Wenhan Yang; Lanqing Guo; Lap-Pui Chau; Alex C. Kot; Bihan Wen

学習を超えたメタデータベースの生画像再構成

RAW 画像には、線形性やきめ細かい量子化レベルなど、sRGB 画像に比べて明確な利点がありますが、大量のストレージ要件があるため、一般ユーザーには広く採用されていません。ごく最近の研究では、生画像のピクセル空間内でサンプリングマスクを設計することによって生画像を圧縮することが提案されています。ただし、これらのアプローチでは、より効果的な画像表現とコンパクトなメタデータを追求する余地が残されることがよくあります。この研究では、メタデータとして機能する潜在空間内のコンパクトな表現をエンドツーエンドで学習する新しいフレームワークを提案します。非可逆画像圧縮と比較して、sRGB 画像からの豊富な情報によって引き起こされる生画像再構成タスクの本質的な違いを分析します。分析に基づいて、非対称およびハイブリッド空間特徴分解能を備えた新しいバックボーン設計が提案され、レート歪み性能が大幅に向上します。さらに、sRGB画像とすでに処理された特徴のマスクの両方に基づいて、エンコード/デコードの次数マスクをより適切に予測できるコンテキストモデルの新しい設計を提案します。オーダーマスク間の相関関係をより適切にモデリングできるため、すでに処理された情報をより効果的に活用できます。さらに、さまざまなレベルの量子化精度をさまざまな領域に動的に割り当てる、新しい sRGB ガイドの適応量子化精度戦略により、モデルの表現能力がさらに強化されます。最後に、提案されたコンテキストモデルの反復特性に基づいて、単一のモデルを使用して可変ビットレートを達成するための新しい戦略を提案します。この戦略により、幅広いビットレートの継続的な収束が可能になります。広範な実験結果は、提案された方法がより小さいメタデータサイズでより優れた再構成品質を達成できることを示しています。

While raw images have distinct advantages over sRGB images, e.g., linearity and fine-grained quantization levels, they are not widely adopted by general users due to their substantial storage requirements. Very recent studies propose to compress raw images by designing sampling masks within the pixel space of the raw image. However, these approaches often leave space for pursuing more effective image representations and compact metadata. In this work, we propose a novel framework that learns a compact representation in the latent space, serving as metadata, in an end-to-end manner. Compared with lossy image compression, we analyze the intrinsic difference of the raw image reconstruction task caused by rich information from the sRGB image. Based on the analysis, a novel backbone design with asymmetric and hybrid spatial feature resolutions is proposed, which significantly improves the rate-distortion performance. Besides, we propose a novel design of the context model, which can better predict the order masks of encoding/decoding based on both the sRGB image and the masks of already processed features. Benefited from the better modeling of the correlation between order masks, the already processed information can be better utilized. Moreover, a novel sRGB-guided adaptive quantization precision strategy, which dynamically assigns varying levels of quantization precision to different regions, further enhances the representation ability of the model. Finally, based on the iterative properties of the proposed context model, we propose a novel strategy to achieve variable bit rates using a single model. This strategy allows for the continuous convergence of a wide range of bit rates. Extensive experimental results demonstrate that the proposed method can achieve better reconstruction quality with a smaller metadata size.

updated: Wed Jun 21 2023 06:59:07 GMT+0000 (UTC)

published: Wed Jun 21 2023 06:59:07 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト