Unsupervised HDR Image and Video Tone Mapping via Contrastive Learning

Cong Cao; Huanjing Yue; Xin Liu; Jingyu Yang

対照学習による教師なし HDR 画像とビデオのトーンマッピング

ハイダイナミックレンジ (HDR) 画像 (ビデオ) をキャプチャすると、暗い領域と明るい領域の両方の詳細を明らかにできるため、魅力的です。主流の画面はローダイナミックレンジ (LDR) コンテンツのみをサポートしているため、HDR 画像 (ビデオ) のダイナミックレンジを圧縮するにはトーンマッピングアルゴリズムが必要です。画像のトーンマッピングは広く研究されていますが、ビデオトーンマッピングは、HDR-LDR ビデオペアが不足しているため、特に深層学習ベースの方法では遅れています。この研究では、教師なし画像およびビデオのトーンマッピングのための統合フレームワーク (IVTMNet) を提案します。教師なしトレーニングを改善するために、ドメインとインスタンスに基づいた対照的な学習損失を提案します。類似性測定のために特徴を抽出するために VGG などの汎用特徴抽出器を使用する代わりに、異なるペアの類似性を測定するために、抽出された特徴の明るさとコントラストの集合である新しい潜在コードを提案します。トーンマッピング結果の潜在コードを制約するために、2 つのネガティブペアと 3 つのポジティブペアを全体的に構築します。ネットワーク構造については、非ローカル領域の情報交換と変換を可能にする空間機能拡張 (SFE) モジュールを提案します。ビデオトーンマッピングについては、時間的相関を効率的に利用し、ビデオトーンマッピング結果の時間的一貫性を向上させるための時間的特徴置換 (TFR) モジュールを提案します。ビデオトーンマッピングの教師なしトレーニングプロセスを容易にするために、大規模なペアになっていない HDR-LDR ビデオデータセットを構築します。実験結果は、私たちの方法が最先端の画像およびビデオのトーンマッピング方法よりも優れていることを示しています。コードとデータセットは https://github.com/cao-cong/UnCLTMO で入手できます。

Capturing high dynamic range (HDR) images (videos) is attractive because it can reveal the details in both dark and bright regions. Since the mainstream screens only support low dynamic range (LDR) content, tone mapping algorithm is required to compress the dynamic range of HDR images (videos). Although image tone mapping has been widely explored, video tone mapping is lagging behind, especially for the deep-learning-based methods, due to the lack of HDR-LDR video pairs. In this work, we propose a unified framework (IVTMNet) for unsupervised image and video tone mapping. To improve unsupervised training, we propose domain and instance based contrastive learning loss. Instead of using a universal feature extractor, such as VGG to extract the features for similarity measurement, we propose a novel latent code, which is an aggregation of the brightness and contrast of extracted features, to measure the similarity of different pairs. We totally construct two negative pairs and three positive pairs to constrain the latent codes of tone mapped results. For the network structure, we propose a spatial-feature-enhanced (SFE) module to enable information exchange and transformation of nonlocal regions. For video tone mapping, we propose a temporal-feature-replaced (TFR) module to efficiently utilize the temporal correlation and improve the temporal consistency of video tone-mapped results. We construct a large-scale unpaired HDR-LDR video dataset to facilitate the unsupervised training process for video tone mapping. Experimental results demonstrate that our method outperforms state-of-the-art image and video tone mapping methods. Our code and dataset are available at https://github.com/cao-cong/UnCLTMO.

updated: Mon Jun 26 2023 13:56:52 GMT+0000 (UTC)

published: Mon Mar 13 2023 17:45:39 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト