Towards High-Quality and Efficient Video Super-Resolution via Spatial-Temporal Data Overfitting

Gen Li; Jie Ji; Minghai Qin; Wei Niu; Bin Ren; Fatemeh Afghah; Linke Guo; Xiaolong Ma

時空間データのオーバーフィッティングによる高品質かつ効率的なビデオ超解像化を目指して

ディープ畳み込みニューラルネットワーク (DNN) がコンピュータービジョンのさまざまな分野で広く使用されているため、DNN のオーバーフィッティング機能を活用してビデオ解像度のアップスケーリングを実現することが、現代のビデオ配信システムの新しいトレンドになっています。ビデオをチャンクに分割し、各チャンクを超解像度モデルでオーバーフィットすることにより、サーバーはビデオをクライアントに送信する前にエンコードするため、ビデオ品質と送信効率が向上します。ただし、多数のチャンクにより良好なオーバーフィッティング品質が確保されることが期待され、これによりストレージが大幅に増加し、データ送信により多くの帯域幅リソースが消費されます。一方、トレーニング最適化手法によってチャンクの数を減らすには、通常、高いモデル容量が必要となり、実行速度が大幅に低下します。これを調和させるために、私たちは、時空間情報を利用してビデオを正確にチャンクに分割し、チャンクの数とモデルサイズを最小限に抑える、高品質で効率的なビデオ解像度アップスケーリングタスクのための新しい方法を提案します。さらに、データを意識した共同トレーニング手法によって手法を単一の過学習モデルに進化させ、品質の低下を無視してストレージ要件をさらに削減します。私たちは既製の携帯電話にモデルを展開し、実験結果は、私たちの方法が高いビデオ品質を備えたリアルタイムビデオ超解像度を達成することを示しています。最先端の方法と比較して、私たちの方法は 41.6 PSNR で 28 fps のストリーミング速度を達成します。これは、ライブビデオの解像度アップスケーリングタスクにおいて 14 倍高速で 2.29 dB 優れています。コードは https://github.com/coulsonlee/STDO-CVPR2023.git で入手できます

As deep convolutional neural networks (DNNs) are widely used in various fields of computer vision, leveraging the overfitting ability of the DNN to achieve video resolution upscaling has become a new trend in the modern video delivery system. By dividing videos into chunks and overfitting each chunk with a super-resolution model, the server encodes videos before transmitting them to the clients, thus achieving better video quality and transmission efficiency. However, a large number of chunks are expected to ensure good overfitting quality, which substantially increases the storage and consumes more bandwidth resources for data transmission. On the other hand, decreasing the number of chunks through training optimization techniques usually requires high model capacity, which significantly slows down execution speed. To reconcile such, we propose a novel method for high-quality and efficient video resolution upscaling tasks, which leverages the spatial-temporal information to accurately divide video into chunks, thus keeping the number of chunks as well as the model size to minimum. Additionally, we advance our method into a single overfitting model by a data-aware joint training technique, which further reduces the storage requirement with negligible quality drop. We deploy our models on an off-the-shelf mobile phone, and experimental results show that our method achieves real-time video super-resolution with high video quality. Compared with the state-of-the-art, our method achieves 28 fps streaming speed with 41.6 PSNR, which is 14× faster and 2.29 dB better in the live video resolution upscaling tasks. Code available in https://github.com/coulsonlee/STDO-CVPR2023.git

updated: Sun Jun 18 2023 15:29:37 GMT+0000 (UTC)

published: Wed Mar 15 2023 02:40:02 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト