End-to-End Learned Block-Based Image Compression with Block-Level Masked Convolutions and Asymptotic Closed Loop Training

Fatih Kamisli

ブロックレベルのマスクされた畳み込みと漸近的な閉ループトレーニングによるエンドツーエンドの学習済みブロックベースの画像圧縮

学習した画像圧縮の研究により、オートエンコーダベースのニューラルネットワークアーキテクチャで最先端の圧縮性能が達成されました。このアーキテクチャでは、画像が畳み込みニューラルネットワーク（CNN）を介して潜在表現にマッピングされ、CNNで再度量子化および処理されて取得されます。再構成された画像。 CNNは入力画像全体を操作します。一方、従来の最先端の画像およびビデオ圧縮方法は、さまざまな理由から、ブロックごとの処理アプローチで画像を処理します。ごく最近、ブロックベースのアプローチによる学習画像圧縮の研究も登場しました。これは、入力画像の大きなブロックでオートエンコーダアーキテクチャを使用し、内部/空間予測および非ブロック化/後処理機能を実行する追加のニューラルネットワークを導入します。このホワイトペーパーでは、明示的なイントラ予測ニューラルネットワークも明示的な非ブロック化ニューラルネットワークも使用されていない、代替の学習済みブロックベースの画像圧縮アプローチについて説明します。ブロックレベルのマスクされた畳み込みを備えた単一のオートエンコーダニューラルネットワークが使用され、ブロックサイズははるかに小さくなります（8x8）。ブロックレベルのマスクされた畳み込みを使用することにより、各ブロックは、エンコーダーとデコーダーの両方で、再構築された隣接する左側と上部のブロックを使用して処理されます。したがって、隣接するブロック間の相互情報量は圧縮中に活用され、各ブロックは隣接するブロックを使用して再構築され、明示的なイントラ予測の必要性を解決し、ニューラルネットワークのブロックを解除します。探索されたシステムは閉ループシステムであるため、特別な最適化手順である漸近閉ループ設計が、標準的な確率的勾配降下法に基づくトレーニングで使用されます。実験結果は、競争力のある画像圧縮性能を示しています。

Learned image compression research has achieved state-of-the-art compression performance with auto-encoder based neural network architectures, where the image is mapped via convolutional neural networks (CNN) into a latent representation that is quantized and processed again with CNN to obtain the reconstructed image. CNN operate on entire input images. On the other hand, traditional state-of-the-art image and video compression methods process images with a block-by-block processing approach for various reasons. Very recently, work on learned image compression with block based approaches have also appeared, which use the auto-encoder architecture on large blocks of the input image and introduce additional neural networks that perform intra/spatial prediction and deblocking/post-processing functions. This paper explores an alternative learned block-based image compression approach in which neither an explicit intra prediction neural network nor an explicit deblocking neural network is used. A single auto-encoder neural network with block-level masked convolutions is used and the block size is much smaller (8x8). By using block-level masked convolutions, each block is processed using reconstructed neighboring left and upper blocks both at the encoder and decoder. Hence, the mutual information between adjacent blocks is exploited during compression and each block is reconstructed using neighboring blocks, resolving the need for explicit intra prediction and deblocking neural networks. Since the explored system is a closed loop system, a special optimization procedure, the asymptotic closed loop design, is used with standard stochastic gradient descent based training. The experimental results indicate competitive image compression performance.

updated: Tue Mar 22 2022 13:01:59 GMT+0000 (UTC)

published: Tue Mar 22 2022 13:01:59 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト