Text Reading Order in Uncontrolled Conditions by Sparse Graph Segmentation

Renshen Wang; Yasuhisa Fujii; Alessandro Bissacco

スパースグラフセグメンテーションによる制御されていない状況でのテキスト読み取り順序

テキストの読み取り順序は、OCR エンジンの出力において重要な側面であり、下流のタスクに大きな影響を与えます。その難しさは、ドメイン固有のレイアウト構造の大きなバリエーションにあり、遠近法の歪みなどの現実世界の画像の劣化によってさらに悪化します。スパースレイアウトベースのグラフで実行されるマルチモーダル、マルチタスクグラフ畳み込みネットワーク (GCN) を使用して、テキストの読み取り順序を識別する軽量でスケーラブルで一般化可能なアプローチを提案します。モデルからの予測は、テキスト行とレイアウト領域構造間の二次元関係のヒントを提供します。これに基づいて、後処理のクラスターと並べ替えアルゴリズムがすべてのテキスト行の順序付けられたシーケンスを生成します。このモデルは言語に依存せず、制御されていない状態で撮影されたさまざまな種類の画像を含む多言語データセット全体で効果的に実行され、モバイルデバイスを含むほぼすべてのプラットフォームに展開できるほど小さいです。

Text reading order is a crucial aspect in the output of an OCR engine, with a large impact on downstream tasks. Its difficulty lies in the large variation of domain specific layout structures, and is further exacerbated by real-world image degradations such as perspective distortions. We propose a lightweight, scalable and generalizable approach to identify text reading order with a multi-modal, multi-task graph convolutional network (GCN) running on a sparse layout based graph. Predictions from the model provide hints of bidimensional relations among text lines and layout region structures, upon which a post-processing cluster-and-sort algorithm generates an ordered sequence of all the text lines. The model is language-agnostic and runs effectively across multi-language datasets that contain various types of images taken in uncontrolled conditions, and it is small enough to be deployed on virtually any platform including mobile devices.

updated: Thu May 04 2023 06:21:00 GMT+0000 (UTC)

published: Thu May 04 2023 06:21:00 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト