EVC: Towards Real-Time Neural Image Compression with Mask Decay

Guo-Hua Wang; Jiahao Li; Bin Li; Yan Lu

EVC: マスク減衰によるリアルタイムニューラル画像圧縮に向けて

ニューラル画像圧縮は、レート歪み (RD) パフォーマンスで最先端の従来のコーデック (H.266/VVC) を上回っていますが、非常に複雑で、レート歪みのトレードオフごとにモデルが分かれているという問題があります。このホワイトペーパーでは、768x512 の入力画像で 30 FPS で実行でき、RD パフォーマンスでは VVC よりも優れた、効率的な単一モデルの可変ビットレートコーデック (EVC) を提案します。エンコーダーとデコーダーの両方の複雑さをさらに軽減することで、小さなモデルでも 1920x1080 の入力画像で 30 FPS を達成します。異なるキャパシティモデル間のパフォーマンスギャップを埋めるために、大きなモデルのパラメーターを小さなモデルに自動的に変換するマスク減衰を細心の注意を払って設計します。また、L_p正則化の欠点を軽減するために、新しいスパース正則化損失が提案されています。私たちのアルゴリズムは、中型モデルと小型モデルのパフォーマンスギャップをそれぞれ 50% と 30% 大幅に狭めます。最後に、ニューラル画像圧縮用のスケーラブルエンコーダーを提唱します。エンコーディングの複雑さは、さまざまな遅延要件を満たすために動的です。残差表現を徐々に減らすために、大きなエンコーダーを複数回減衰させることを提案します。マスク減衰と残差表現学習の両方により、スケーラブルエンコーダーの RD パフォーマンスが大幅に向上します。コードは https://github.com/microsoft/DCVC にあります。

Neural image compression has surpassed state-of-the-art traditional codecs (H.266/VVC) for rate-distortion (RD) performance, but suffers from large complexity and separate models for different rate-distortion trade-offs. In this paper, we propose an Efficient single-model Variable-bit-rate Codec (EVC), which is able to run at 30 FPS with 768x512 input images and still outperforms VVC for the RD performance. By further reducing both encoder and decoder complexities, our small model even achieves 30 FPS with 1920x1080 input images. To bridge the performance gap between our different capacities models, we meticulously design the mask decay, which transforms the large model's parameters into the small model automatically. And a novel sparsity regularization loss is proposed to mitigate shortcomings of L_p regularization. Our algorithm significantly narrows the performance gap by 50% and 30% for our medium and small models, respectively. At last, we advocate the scalable encoder for neural image compression. The encoding complexity is dynamic to meet different latency requirements. We propose decaying the large encoder multiple times to reduce the residual representation progressively. Both mask decay and residual representation learning greatly improve the RD performance of our scalable encoder. Our code is at https://github.com/microsoft/DCVC.

updated: Fri Feb 10 2023 06:02:29 GMT+0000 (UTC)

published: Fri Feb 10 2023 06:02:29 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト