SIDAR: Synthetic Image Dataset for Alignment & Restoration

Monika Kwiatkowski; Simon Matern; Olaf Hellwich

SIDAR: 位置合わせと復元のための合成画像データセット

画像の位置合わせと画像の復元は、古典的なコンピュータビジョンのタスクです。ただし、エンドツーエンドの深層学習モデルをトレーニングおよび評価するのに十分なデータを提供するデータセットがまだ不足しています。画像の位置合わせのためのグラウンドトゥルースデータを取得するには、洗練されたモーションフロムモーション手法やオプティカルフローシステムが必要ですが、これらでは十分なデータ分散が得られないことがよくあります。つまり、一般に多数の画像対応を提供しながら、基礎となる画像内の風景の変化はほとんどありません。画像シーケンス。別のアプローチでは、既存の画像データにランダムな遠近歪みを利用します。ただし、これでは些細な歪みが生じるだけで、現実世界のシナリオのような複雑さや変化はありません。代わりに、私たちが提案するデータ拡張は、3D レンダリングを使用することでデータ不足の問題を克服するのに役立ちます。画像がテクスチャとして平面に追加され、その後、さまざまな照明条件、影、オクルージョンがシーンに追加されます。シーンは複数の視点からレンダリングされ、ランダム化されたホモグラフィーではなくカメラ投影のホモグラフィーによく似たホモグラフィーを使用して、現実世界のシナリオとより一致した遠近歪みを生成します。シーンごとに、対応するオクルージョンマスク、ホモグラフィー、グラウンドトゥルースラベルを備えた一連の歪んだ画像が提供されます。結果として得られるデータセットは、ディープホモグラフィー推定、高密度画像マッチング、2D バンドル調整、修復、影の除去、ノイズ除去、コンテンツの取得、背景の減算など、画像の位置合わせやアーティファクトの除去を伴う多数のタスクのトレーニングおよび評価セットとして機能します。。当社のデータ生成パイプラインはカスタマイズ可能で、既存のデータセットに適用でき、既存の手法の特徴学習をさらに改善するデータ拡張として機能します。

Image alignment and image restoration are classical computer vision tasks. However, there is still a lack of datasets that provide enough data to train and evaluate end-to-end deep learning models. Obtaining ground-truth data for image alignment requires sophisticated structure-from-motion methods or optical flow systems that often do not provide enough data variance, i.e., typically providing a high number of image correspondences, while only introducing few changes of scenery within the underlying image sequences. Alternative approaches utilize random perspective distortions on existing image data. However, this only provides trivial distortions, lacking the complexity and variance of real-world scenarios. Instead, our proposed data augmentation helps to overcome the issue of data scarcity by using 3D rendering: images are added as textures onto a plane, then varying lighting conditions, shadows, and occlusions are added to the scene. The scene is rendered from multiple viewpoints, generating perspective distortions more consistent with real-world scenarios, with homographies closely resembling those of camera projections rather than randomized homographies. For each scene, we provide a sequence of distorted images with corresponding occlusion masks, homographies, and ground-truth labels. The resulting dataset can serve as a training and evaluation set for a multitude of tasks involving image alignment and artifact removal, such as deep homography estimation, dense image matching, 2D bundle adjustment, inpainting, shadow removal, denoising, content retrieval, and background subtraction. Our data generation pipeline is customizable and can be applied to any existing dataset, serving as a data augmentation to further improve the feature learning of any existing method.

updated: Fri May 19 2023 23:32:06 GMT+0000 (UTC)

published: Fri May 19 2023 23:32:06 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト