JSI-GAN: GAN-Based Joint Super-Resolution and Inverse Tone-Mapping with Pixel-Wise Task-Specific Filters for UHD HDR Video

Soo Ye Kim; Jihyong Oh; Munchurl Kim

JSI-GAN：UHD HDRビデオ用のピクセル単位のタスク固有のフィルターを使用したGANベースの共同超解像度および逆トーンマッピング

従来の低解像度（LR）標準ダイナミックレンジ（SDR）ビデオを高解像度（HR）高ダイナミックレンジ（HDR）ビデオに変換するために、超解像度（SR）と逆トーンマッピング（ITM）の共同学習が最近検討されましたUHD HDR TV /放送アプリケーションのニーズの高まりに対応。ただし、以前のCNNベースの方法は、LR SDRフレームからHR HDRフレームを直接再構築し、単純なL2損失でのみトレーニングされます。この論文では、3つのタスク固有のサブネットで構成されるJSI-GANと呼ばれる新しいGANベースの共同SR-ITMネットワークを設計する際に分割統治アプローチを採用します：イメージ再構成サブネット、詳細復元（ DR）サブネットおよびローカルコントラストエンハンスメント（LCE）サブネット。これらのサブネットは、意図した目的に合わせて適切にトレーニングされるように細心の注意を払って設計し、詳細な復元のためにDRサブネットを介してピクセル単位の1D分離可能フィルターのペアを、コントラスト強調のためにLCEサブネットによってピクセル単位の2Dローカルフィルターを学習します。さらに、JSI-GANを効果的にトレーニングするために、ローカルの詳細とコントラストの両方を強化して高品質のHR HDR結果を再構築するのに役立つ、従来のGAN損失とともに新規の詳細GAN損失を提案します。すべてのサブネットが適切に共同トレーニングされると、以前の方法で生成されたものよりもPSNRが少なくとも0.41 dB高い、高品質の予測HR HDR結果が得られます。

Joint learning of super-resolution (SR) and inverse tone-mapping (ITM) has been explored recently, to convert legacy low resolution (LR) standard dynamic range (SDR) videos to high resolution (HR) high dynamic range (HDR) videos for the growing need of UHD HDR TV/broadcasting applications. However, previous CNN-based methods directly reconstruct the HR HDR frames from LR SDR frames, and are only trained with a simple L2 loss. In this paper, we take a divide-and-conquer approach in designing a novel GAN-based joint SR-ITM network, called JSI-GAN, which is composed of three task-specific subnets: an image reconstruction subnet, a detail restoration (DR) subnet and a local contrast enhancement (LCE) subnet. We delicately design these subnets so that they are appropriately trained for the intended purpose, learning a pair of pixel-wise 1D separable filters via the DR subnet for detail restoration and a pixel-wise 2D local filter by the LCE subnet for contrast enhancement. Moreover, to train the JSI-GAN effectively, we propose a novel detail GAN loss alongside the conventional GAN loss, which helps enhancing both local details and contrasts to reconstruct high quality HR HDR results. When all subnets are jointly trained well, the predicted HR HDR results of higher quality are obtained with at least 0.41 dB gain in PSNR over those generated by the previous methods.

updated: Mon Dec 16 2019 06:20:32 GMT+0000 (UTC)

published: Tue Sep 10 2019 10:30:35 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト