HiNeRV: Video Compression with Hierarchical Encoding based Neural Representation

Ho Man Kwan; Ge Gao; Fan Zhang; Andrew Gower; David Bull

HiNeRV: 階層エンコーディングを使用したビデオ圧縮ベースのニューラル表現

学習ベースのビデオ圧縮は現在最も人気のある研究トピックの 1 つであり、従来の標準ビデオコーデックと競合する可能性があります。これに関連して、Implicit Neural Representations (INR) は、画像やビデオのコンテンツを表現および圧縮するために以前から使用されており、他の方法と比較して比較的高いデコード速度を示しています。しかし、既存の INR ベースの方法では、最先端のビデオ圧縮に匹敵するレート品質のパフォーマンスを実現できませんでした。これは主に、使用されているネットワークアーキテクチャが単純であるため、表現能力が制限されていることが原因です。この論文では、双線形補間と新しい階層位置符号化を組み合わせた INR、HiNeRV を提案します。この構造は、深さ方向の畳み込み層と MLP 層を採用して、はるかに高い容量を備えた深くて広いネットワークアーキテクチャを構築します。さらに、HiNeRV に基づくビデオコーデックと、非可逆モデル圧縮中に HiNeRV のパフォーマンスをより適切に維持できるトレーニング、プルーニング、量子化用の洗練されたパイプラインを構築します。提案された方法は、ビデオ圧縮の UVG データセットと MCL-JCV データセットの両方で評価され、学習ベースのコーデックと比較した場合、既存のすべての INR ベースラインと競合パフォーマンスを大幅に改善することが実証されました (HNeRV と比較して全体のビットレートが 72.3%、DCVC と比較して 43.4% 節約) UVG データセット上、PSNR で測定)。

Learning-based video compression is currently one of the most popular research topics, offering the potential to compete with conventional standard video codecs. In this context, Implicit Neural Representations (INRs) have previously been used to represent and compress image and video content, demonstrating relatively high decoding speed compared to other methods. However, existing INR-based methods have failed to deliver rate quality performance comparable with the state of the art in video compression. This is mainly due to the simplicity of the employed network architectures, which limit their representation capability. In this paper, we propose HiNeRV, an INR that combines bilinear interpolation with novel hierarchical positional encoding. This structure employs depth-wise convolutional and MLP layers to build a deep and wide network architecture with much higher capacity. We further build a video codec based on HiNeRV and a refined pipeline for training, pruning and quantization that can better preserve HiNeRV's performance during lossy model compression. The proposed method has been evaluated on both UVG and MCL-JCV datasets for video compression, demonstrating significant improvement over all existing INRs baselines and competitive performance when compared to learning-based codecs (72.3% overall bit rate saving over HNeRV and 43.4% over DCVC on the UVG dataset, measured in PSNR).

updated: Fri Jun 16 2023 12:59:52 GMT+0000 (UTC)

published: Fri Jun 16 2023 12:59:52 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト