Neural Data-Dependent Transform for Learned Image Compression

Dezhao Wang; Wenhan Yang; Yueyu Hu; Jiaying Liu

学習した画像圧縮のためのニューラルデータ依存変換

学習した画像圧縮は、その優れたモデリング能力により大きな成功を収めていますが、各入力画像のレート歪み最適化（RDO）をさらに考慮することはめったにありません。学習したコーデックでこの可能性を探求するために、ニューラルデータに依存する変換を構築し、個々の画像のコーディング効率を共同で最適化する連続オンラインモード決定メカニズムを導入する最初の試みを行います。具体的には、画像コンテンツストリームとは別に、追加のモデルストリームを使用して、デコーダー側で変換パラメーターを生成します。モデルストリームの存在により、モデルはより抽象的な神経構文を学習できるようになり、画像の潜在表現をよりコンパクトにクラスター化するのに役立ちます。変換段階を超えて、追加のデコードオーバーヘッドに関係なく、より高品質の再構築を必要とするシナリオには、ニューラル構文ベースの後処理も採用しています。さらに、モデルストリームの関与により、表現とデコーダーの両方をオンラインで最適化することが可能になります。つまり、テスト時のRDOです。これは、従来のコーデックのコーディングモードのように、個々の入力画像に基づいてコーディング効率を向上させるための継続的なオンラインモードの決定に相当します。実験結果は、提案されたニューラル構文設計と連続オンラインモード決定メカニズムの有効性を示しており、最新の従来の標準Versatile Video Coding（VVC）やその他の最新の状態と比較したコーディング効率における私たちの方法の優位性を示しています。アート学習ベースの方法。

Learned image compression has achieved great success due to its excellent modeling capacity, but seldom further considers the Rate-Distortion Optimization (RDO) of each input image. To explore this potential in the learned codec, we make the first attempt to build a neural data-dependent transform and introduce a continuous online mode decision mechanism to jointly optimize the coding efficiency for each individual image. Specifically, apart from the image content stream, we employ an additional model stream to generate the transform parameters at the decoder side. The presence of a model stream enables our model to learn more abstract neural-syntax, which helps cluster the latent representations of images more compactly. Beyond the transform stage, we also adopt neural-syntax based post-processing for the scenarios that require higher quality reconstructions regardless of extra decoding overhead. Moreover, the involvement of the model stream further makes it possible to optimize both the representation and the decoder in an online way, i.e. RDO at the testing time. It is equivalent to a continuous online mode decision, like coding modes in the traditional codecs, to improve the coding efficiency based on the individual input image. The experimental results show the effectiveness of the proposed neural-syntax design and the continuous online mode decision mechanism, demonstrating the superiority of our method in coding efficiency compared to the latest conventional standard Versatile Video Coding (VVC) and other state-of-the-art learning-based methods.

updated: Wed Mar 09 2022 14:56:48 GMT+0000 (UTC)

published: Wed Mar 09 2022 14:56:48 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト