An Efficient Dual-reference Training Data Acquisition Method for CNN Image Super-Resolution

Yanhui Guo; Xiao Shu; Xiaolin Wu

CNN画像超解像のための効率的なデュアルリファレンストレーニングデータ取得方法

画像の超解像の深層学習方法の場合、最も重要な問題は、トレーニング用の低解像度と高解像度のペアの画像が実際のカメラのサンプリングプロセスを正確に反映しているかどうかです。既存の劣化モデル（バイキュービックダウンサンプリングなど）によって合成された低解像度と高解像度（LR∼HR）の画像ペアは、実際の画像ペアとは異なります。したがって、これらの合成されたLR∼HR画像ペアによってトレーニングされた超解像CNNは、実際の画像に適用された場合、うまく機能しません。本論文では、実際のカメラを使用して現実的なLR∼HR画像ペアの大規模なセットをキャプチャするための新しい方法を提案します。データ取得は、人間の介入を最小限に抑え、高スループット（1時間あたり約500画像ペア）で制御可能なラボ条件下で実行されます。高レベルの自動化により、カメラごとに実際のLR∼HRトレーニング画像ペアのセットを簡単に作成できます。私たちの革新は、さまざまな解像度で超高品質の画面に表示された画像を撮影することです。画像の超解像のための私たちの方法の3つの特徴的な利点があります。まず、LRおよびHR画像が3D平面（画面）から取得されるため、レジストレーションの問題はホモグラフィモデルに正確に適合し、レジストレーションの精度を向上させるために特別に設計されたマーカーを画像に表示できます。第二に、表示されたデジタル画像ファイルは、復元された画像の高周波コンテンツを最適化するための参照として利用できます。第三に、この高効率のデータ収集方法により、カメラセンサーごとにカスタマイズされたデータセットを収集することが可能になり、目的のカメラセンサーの特定のモデルをトレーニングできます。実験結果は、LR∼HRデータセットによる超解像CNNのトレーニングは、推論段階で実世界の画像上の既存のデータセットによるトレーニングよりも優れた復元パフォーマンスを持っていることを示しています。

For deep learning methods of image super-resolution, the most critical issue is whether the paired low and high resolution images for training accurately reflect the sampling process of real cameras. Low and high resolution (LR∼HR) image pairs synthesized by existing degradation models (e.g. bicubic downsampling) deviate from those in reality; thus the super-resolution CNN trained by these synthesized LR∼HR image pairs does not perform well when being applied to real images. In this paper, we propose a novel method to capture a large set of realistic LR∼HR image pairs using real cameras. The data acquisition is carried out under controllable lab conditions with minimum human intervention and at high throughput (about 500 image pairs per hour). The high level of automation makes it easy to produce a set of real LR∼HR training image pairs for each camera.Our innovation is to shoot images displayed on an ultra-high quality screen at different resolutions. There are three distinctive advantages of our method for image super-resolution. First, as the LR and HR images are taken of a 3D planar surface (the screen) the registration problem fits exactly to a homography model and we can display specially designed markers on the image to improve the registration precision. Second, the displayed digital image file can be exploited as a reference to optimize the high frequency content of the restored image. Third, this high-efficiency data collection method makes it possible to collect a customized dataset for each camera sensor, for which one can train a specific model for the intended camera sensor. Experimental results show that training a super-resolution CNN by our LR∼HR dataset has superior restoration performance than training it by existing datasets on real world images at the inference stage.

updated: Tue Aug 24 2021 07:01:51 GMT+0000 (UTC)

published: Thu Aug 05 2021 03:31:50 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト