PP-StructureV2: A Stronger Document Analysis System

Chenxia Li; Ruoyu Guo; Jun Zhou; Mengtao An; Yuning Du; Lingfeng Zhu; Yi Liu; Xiaoguang Hu; Dianhai Yu

PP-StructureV2: より強力なドキュメント分析システム

大量の文書データは、テキスト情報を持たない生の画像などの非構造化形式で存在します。実用的なドキュメント画像分析システムを設計することは、意味のある作業ですが、やりがいのある作業です。以前の研究では、知的文書分析システム PP-Structure を提案しました。 PP-Structure の機能とパフォーマンスをさらにアップグレードするために、レイアウト情報抽出とキー情報抽出の 2 つのサブシステムを含む PP-StructureV2 を提案します。まず、画像方向補正モジュールとレイアウト復元モジュールを統合して、システムの機能を強化します。次に、PP-StructureV2 ではパフォーマンスを向上させるために 8 つの実用的な戦略が利用されています。レイアウト解析モデルには、超軽量検出器PP-PicoDetとモデル軽量化のための知識蒸留アルゴリズムFGDを導入し、同等のmAPと比較して11倍の推論速度を実現しました。テーブル認識モデルでは、PP-LCNet、CSP-PAN、SLAHead を利用して、バックボーンモジュール、フィーチャフュージョンモジュール、デコードモジュールをそれぞれ最適化し、同等の推論速度でテーブル構造の精度を 6% 向上させました。キー情報抽出モデルでは、視覚的特徴に依存しない LayoutXLM アーキテクチャである VI-LayoutXLM、TB-YX ソートアルゴリズム、および U-DML 知識蒸留アルゴリズムを導入し、セマンティックエンティティ認識の Hmean でそれぞれ 2.8% および 9.1% の改善をもたらしました。および関係抽出タスク。上記のモデルとコードはすべて、GitHub リポジトリ PaddleOCR でオープンソース化されています。

A large amount of document data exists in unstructured form such as raw images without any text information. Designing a practical document image analysis system is a meaningful but challenging task. In previous work, we proposed an intelligent document analysis system PP-Structure. In order to further upgrade the function and performance of PP-Structure, we propose PP-StructureV2 in this work, which contains two subsystems: Layout Information Extraction and Key Information Extraction. Firstly, we integrate Image Direction Correction module and Layout Restoration module to enhance the functionality of the system. Secondly, 8 practical strategies are utilized in PP-StructureV2 for better performance. For Layout Analysis model, we introduce ultra light-weight detector PP-PicoDet and knowledge distillation algorithm FGD for model lightweighting, which increased the inference speed by 11 times with comparable mAP. For Table Recognition model, we utilize PP-LCNet, CSP-PAN and SLAHead to optimize the backbone module, feature fusion module and decoding module, respectively, which improved the table structure accuracy by 6% with comparable inference speed. For Key Information Extraction model, we introduce VI-LayoutXLM which is a visual-feature independent LayoutXLM architecture, TB-YX sorting algorithm and U-DML knowledge distillation algorithm, which brought 2.8% and 9.1% improvement respectively on the Hmean of Semantic Entity Recognition and Relation Extraction tasks. All the above mentioned models and code are open-sourced in the GitHub repository PaddleOCR.

updated: Thu Oct 13 2022 07:11:59 GMT+0000 (UTC)

published: Tue Oct 11 2022 12:07:32 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト