Digital media is ubiquitous and produced in ever-growing quantities. This necessitates a constant evolution of compression techniques, especially for video, in order to maintain efficient storage and transmission. In this work, we aim at exploiting non-local redundancies in video data that remain difficult to erase for conventional video codecs. We design convolutional neural networks with a particular emphasis on low memory and computational footprint. The parameters of those networks are trained on the fly, at encoding time, to predict the residual signal from the decoded video signal. After the training process has converged, the parameters are compressed and signalled as part of the code of the underlying video codec. The method can be applied to any existing video codec to increase coding gains while its low computational footprint allows for an application under resource-constrained conditions. Building on top of High Efficiency Video Coding, we achieve coding gains similar to those of pretrained denoising CNNs while only requiring about 1% of their computational complexity. Through extensive experiments, we provide insights into the effectiveness of our network design decisions. In addition, we demonstrate that our algorithm delivers stable performance under conditions met in practical video compression: our algorithm performs without significant performance loss on very long random access segments (up to 256 frames) and with moderate performance drops can even be applied to single frames in high-resolution low delay settings.
updated: Tue Apr 28 2020 05:53:52 GMT+0000 (UTC)
published: Sat Oct 19 2019 10:09:39 GMT+0000 (UTC)