Learning Cross-Scale Weighted Prediction for Efficient Neural Video Compression

Zongyu Guo; Runsen Feng; Zhizheng Zhang; Xin Jin; Zhibo Chen

効率的なニューラルビデオ圧縮のためのクロススケール加重予測の学習

ニューラルビデオコーデックは、ビデオ伝送およびストレージアプリケーションで大きな可能性を示しています。既存のニューラルハイブリッドビデオコーディングアプローチは、予測のためにオプティカルフローまたはガウススケールフローに依存しており、多様なモーションコンテンツへのきめの細かい適応をサポートできません。よりコンテンツに適応した予測に向けて、より効果的な動き補償を実現する新しいクロススケール予測モジュールを提案します。具体的には、一方では、参照フィーチャピラミッドを予測ソースとして生成し、フィーチャスケールを活用して予測の精度を制御するクロススケールフローを送信します。一方、単一の参照フレームしか利用できない場合でも、重み付けされた予測メカニズムが初めて導入されました。これにより、クロススケールの重みマップを送信することで、細かい予測結果を合成することができます。クロススケール予測モジュールに加えて、推論中に追加の計算ペナルティなしでレート歪みのパフォーマンスを向上させる多段量子化戦略をさらに提案します。いくつかのベンチマークデータセットで、効率的なニューラルビデオコーデック (ENVC) の有望なパフォーマンスを示します。特に、提案された ENVC は、低遅延モードの UVG データセットの sRGB PSNR に関して、最新のコーディング標準 H.266/VVC と競合できます。また、さまざまなビデオコンテンツの処理におけるクロススケール予測モジュールの有効性を詳細に分析し、それらの重要なコンポーネントを分析するための包括的なアブレーション研究を提供します。テストコードは https://github.com/USTC-IMCL/ENVC で入手できます。

Neural video codecs have demonstrated great potential in video transmission and storage applications. Existing neural hybrid video coding approaches rely on optical flow or Gaussian-scale flow for prediction, which cannot support fine-grained adaptation to diverse motion content. Towards more content-adaptive prediction, we propose a novel cross-scale prediction module that achieves more effective motion compensation. Specifically, on the one hand, we produce a reference feature pyramid as prediction sources and then transmit cross-scale flows that leverage the feature scale to control the precision of prediction. On the other hand, for the first time, a weighted prediction mechanism is introduced even if only a single reference frame is available, which can help synthesize a fine prediction result by transmitting cross-scale weight maps. In addition to the cross-scale prediction module, we further propose a multi-stage quantization strategy, which improves the rate-distortion performance with no extra computational penalty during inference. We show the encouraging performance of our efficient neural video codec (ENVC) on several benchmark datasets. In particular, the proposed ENVC can compete with the latest coding standard H.266/VVC in terms of sRGB PSNR on UVG dataset for the low-latency mode. We also analyze in detail the effectiveness of the cross-scale prediction module in handling various video content, and provide a comprehensive ablation study to analyze those important components. Test code is available at https://github.com/USTC-IMCL/ENVC .

updated: Wed Mar 15 2023 15:48:47 GMT+0000 (UTC)

published: Sun Dec 26 2021 03:12:17 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト