Towards Real-World Video Denosing: A Practical Video Denosing Dataset and Network

Xiaogang Xu; Yitong Yu; Nianjuan Jiang; Jiangbo Lu; Bei Yu; Jiaya Jia

実世界のビデオデノシングに向けて: 実用的なビデオデノシングデータセットとネットワーク

ビデオノイズ除去の研究を促進するために、sRGB と RAW 形式の両方で 200 のノイズの少ないクリーンな動的ビデオペアを含む、説得力のあるデータセット、つまり「実用的なビデオノイズ除去データセット」(PVDD) を構築します。限られたモーション情報で構成される既存のデータセットと比較して、PVDD は変動する自然なモーションを持つダイナミックなシーンをカバーします。主にガウス分布またはポアソン分布を使用して sRGB ドメインのノイズを合成するデータセットとは異なり、PVDD は物理的に意味のあるセンサーノイズモデルとそれに続く ISP 処理を使用して RAW ドメインから現実的なノイズを合成します。さらに、Recurrent Video Denoising Transformer (RVDT) と呼ばれる新しいビデオノイズ除去フレームワークも提案します。これは、PVDD およびその他の現在のビデオノイズ除去ベンチマークで SOTA パフォーマンスを達成できます。 RVDT は、空間次元と時間次元の長期伝搬でノイズ除去を行うための空間変換ブロックと時間変換ブロックの両方で構成されています。特に、RVDT はアテンションメカニズムを活用して、暗黙的および明示的な時間モデリングの両方で双方向の特徴伝播を実装します。広範な実験により、1) PVDD でトレーニングされたモデルは、他の既存のデータセットでトレーニングされたモデルよりも、多くの挑戦的な現実世界のビデオで優れたノイズ除去パフォーマンスを達成することが実証されています。 2) 同じデータセットでトレーニングした場合、提案された RVDT は、他のタイプのネットワークよりも優れたノイズ除去パフォーマンスを持つことができます。

To facilitate video denoising research, we construct a compelling dataset, namely, "Practical Video Denoising Dataset" (PVDD), containing 200 noisy-clean dynamic video pairs in both sRGB and RAW format. Compared with existing datasets consisting of limited motion information, PVDD covers dynamic scenes with varying and natural motion. Different from datasets using primarily Gaussian or Poisson distributions to synthesize noise in the sRGB domain, PVDD synthesizes realistic noise from the RAW domain with a physically meaningful sensor noise model followed by ISP processing. Moreover, we also propose a new video denoising framework, called Recurrent Video Denoising Transformer (RVDT), which can achieve SOTA performance on PVDD and other current video denoising benchmarks. RVDT consists of both spatial and temporal transformer blocks to conduct denoising with long-range operations on the spatial dimension and long-term propagation on the temporal dimension. Especially, RVDT exploits the attention mechanism to implement the bi-directional feature propagation with both implicit and explicit temporal modeling. Extensive experiments demonstrate that 1) models trained on PVDD achieve superior denoising performance on many challenging real-world videos than on models trained on other existing datasets; 2) trained on the same dataset, our proposed RVDT can have better denoising performance than other types of networks.

updated: Thu Dec 01 2022 08:23:48 GMT+0000 (UTC)

published: Mon Jul 04 2022 12:30:22 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト