Intrinsic Decomposition of Document Images In-the-Wild

Sagnik Das; Hassan Ahmed Sial; Ke Ma; Ramon Baldrich; Maria Vanrell; Dimitris Samaras

野生の文書画像の本質的な分解

ドキュメントコンテンツの自動処理は、紙の形状、不均一で多様な照明条件によって引き起こされるアーティファクトの影響を受けます。大量のデータが必要なため、実際のデータを完全に監視する方法は不可能です。したがって、現在の最先端の深層学習モデルは、完全または部分的に合成された画像でトレーニングされます。ただし、ドキュメントの影や陰影の除去結果は、次の理由で依然として影響を受けます。（a）以前の方法は、ローカルカラー統計の均一性に依存しているため、複雑なドキュメントの形状やテクスチャを使用した実際のシナリオへの適用が制限されます。（b）非現実的なシミュレートされた照明条件を持つ合成またはハイブリッドデータセットを使用して、モデルをトレーニングします。このホワイトペーパーでは、2つの主要な貢献によってこれらの問題に取り組んでいます。まず、困難な照明条件に一般化する固有の画像形成に基づいてドキュメントの反射率を直接推定する、物理的に制約された学習ベースの方法。第二に、現実的なシェーディングと多様なマルチイルミネーション条件を幅広く追加することにより、以前の合成データセットを明らかに改善する新しいデータセットであり、実際のドキュメントを処理するために独自にカスタマイズされています。提案されたアーキテクチャは、合成テクスチャのみが弱いトレーニング信号として使用される自己監視方式で機能します（シェーディングと反射率のもつれを解いたバージョンで非常にコストのかかるグラウンドトゥルースの必要性を排除します）。提案されたアプローチは、困難な照明を伴う実際のシーンにおける文書反射率推定の重要な一般化につながります。固有の画像分解およびドキュメントの影の除去タスクに使用できる実際のベンチマークデータセットを広範囲に評価します。 OCRパイプラインの前処理ステップとして使用した場合の反射率推定スキームは、文字エラー率（CER）が26％向上していることを示しており、実用性が証明されています。

Automatic document content processing is affected by artifacts caused by the shape of the paper, non-uniform and diverse color of lighting conditions. Fully-supervised methods on real data are impossible due to the large amount of data needed. Hence, the current state of the art deep learning models are trained on fully or partially synthetic images. However, document shadow or shading removal results still suffer because: (a) prior methods rely on uniformity of local color statistics, which limit their application on real-scenarios with complex document shapes and textures and; (b) synthetic or hybrid datasets with non-realistic, simulated lighting conditions are used to train the models. In this paper we tackle these problems with our two main contributions. First, a physically constrained learning-based method that directly estimates document reflectance based on intrinsic image formation which generalizes to challenging illumination conditions. Second, a new dataset that clearly improves previous synthetic ones, by adding a large range of realistic shading and diverse multi-illuminant conditions, uniquely customized to deal with documents in-the-wild. The proposed architecture works in a self-supervised manner where only the synthetic texture is used as a weak training signal (obviating the need for very costly ground truth with disentangled versions of shading and reflectance). The proposed approach leads to a significant generalization of document reflectance estimation in real scenes with challenging illumination. We extensively evaluate on the real benchmark datasets available for intrinsic image decomposition and document shadow removal tasks. Our reflectance estimation scheme, when used as a pre-processing step of an OCR pipeline, shows a 26% improvement of character error rate (CER), thus, proving the practical applicability.

updated: Sun Nov 29 2020 21:39:58 GMT+0000 (UTC)

published: Sun Nov 29 2020 21:39:58 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト