Implicit Feature Alignment: Learn to Convert Text Recognizer to Text Spotter

Tianwei Wang; Yuanzhi Zhu; Lianwen Jin; Dezhi Peng; Zhe Li; Mengchao He; Yongpan Wang; Canjie Luo

暗黙的な機能の調整：テキスト認識機能をテキストスポッターに変換する方法を学ぶ

テキスト認識は、多くの関連する課題を伴う人気のある研究テーマです。近年のかなりの進歩にもかかわらず、テキスト認識タスク自体は、トリミングされた行のテキスト画像を読み取る問題を解決するために依然として制約されており、光学式文字認識（OCR）システムのサブタスクとして機能します。その結果、最終的なテキスト認識結果は、テキスト検出器のパフォーマンスによって制限されます。この論文では、Implicit Feature Alignment（IFA）と呼ばれるシンプルでエレガントで効果的なパラダイムを提案します。これは、現在のテキスト認識機能に簡単に統合でき、IFAinferenceと呼ばれる新しい推論メカニズムをもたらします。これにより、通常のテキスト認識機能で複数行のテキストを処理できるため、テキスト検出を完全に解放できます。具体的には、IFAを2つの最も一般的なテキスト認識ストリーム（注意ベースとCTCベース）に統合し、注意誘導高密度予測（ADP）と拡張CTC（ExCTC）を提案します。さらに、ワッサースタインベースの中空凝集クロスエントロピー（WH-ACE）は、ADPとExCTCのトレーニングを支援するために負の予測を抑制するために提案されています。 IFAが、最速の速度を維持しながらエンドツーエンドのドキュメント認識タスクで最先端のパフォーマンスを実現し、ADPとExCTCがさまざまなアプリケーションシナリオの観点から相互に補完することを実験的に示します。コードはhttps://github.com/WangTianwei/Implicit-feature-alignmentで入手できます。

Text recognition is a popular research subject with many associated challenges. Despite the considerable progress made in recent years, the text recognition task itself is still constrained to solve the problem of reading cropped line text images and serves as a subtask of optical character recognition (OCR) systems. As a result, the final text recognition result is limited by the performance of the text detector. In this paper, we propose a simple, elegant and effective paradigm called Implicit Feature Alignment (IFA), which can be easily integrated into current text recognizers, resulting in a novel inference mechanism called IFAinference. This enables an ordinary text recognizer to process multi-line text such that text detection can be completely freed. Specifically, we integrate IFA into the two most prevailing text recognition streams (attention-based and CTC-based) and propose attention-guided dense prediction (ADP) and Extended CTC (ExCTC). Furthermore, the Wasserstein-based Hollow Aggregation Cross-Entropy (WH-ACE) is proposed to suppress negative predictions to assist in training ADP and ExCTC. We experimentally demonstrate that IFA achieves state-of-the-art performance on end-to-end document recognition tasks while maintaining the fastest speed, and ADP and ExCTC complement each other on the perspective of different application scenarios. Code will be available at https://github.com/WangTianwei/Implicit-feature-alignment.

updated: Thu Jun 10 2021 17:06:28 GMT+0000 (UTC)

published: Thu Jun 10 2021 17:06:28 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト