ICDAR 2023 Competition on Structured Text Extraction from Visually-Rich Document Images

Wenwen Yu; Chengquan Zhang; Haoyu Cao; Wei Hua; Bohan Li; Huang Chen; Mingyu Liu; Mingrui Chen; Jianfeng Kuang; Mengjun Cheng; Yuning Du; Shikun Feng; Xiaoguang Hu; Pengyuan Lyu; Kun Yao; Yuechen Yu; Yuliang Liu; Wanxiang Che; Errui Ding; Cheng-Lin Liu; Jiebo Luo; Shuicheng Yan; Min Zhang; Dimosthenis Karatzas; Xing Sun; Jingdong Wang; Xiang Bai

ICDAR 2023 視覚的に豊富な文書画像からの構造化テキスト抽出に関するコンペティション

構造化テキスト抽出は、Document AI の分野で最も価値があり、かつ挑戦的なアプリケーションの方向性の 1 つです。ただし、過去のベンチマークのシナリオは限られており、対応する評価プロトコルは通常、構造化テキスト抽出スキームのサブモジュールに焦点を当てています。これらの問題を解決するために、Visually-Rich Document image (SVRD) からの構造化テキスト抽出に関する ICDAR 2023 コンペティションを開催しました。 SVRD にはトラック 1: HUST-CELL とトラック 2: Baidu-FEST の 2 つのトラックを設定しました。HUST-CELL は複雑なエンティティのリンクとラベル付けのエンドツーエンドのパフォーマンスを評価することを目的とし、Baidu-FEST は評価に焦点を当てています。エンドツーエンドの観点から見たゼロショット/フューショット構造化テキスト抽出のパフォーマンスと一般化。現在のドキュメントベンチマークと比較して、競合ベンチマークの 2 つのトラックはシナリオを大幅に強化しており、50 種類を超える視覚的に豊富なドキュメントイメージ (主に実際のエンタープライズアプリケーションからのもの) が含まれています。コンテストは 2022 年 12 月 30 日に開幕し、2023 年 3 月 24 日に閉幕しました。トラック 1 には 35 人の参加者と 91 件の有効な提出物があり、トラック 2 には 15 人の参加者と 26 件の有効な提出物がありました。このレポートでは、その動機を紹介します。競争データセット、タスク定義、評価プロトコル、および提出概要。提出されたパフォーマンスによると、複雑なゼロショットシナリオで期待される情報抽出パフォーマンスにはまだ大きなギャップがあると考えられます。このコンペティションが CV と NLP の分野の多くの研究者を惹きつけ、Document AI の分野に新しい考えをもたらすことが期待されます。

Structured text extraction is one of the most valuable and challenging application directions in the field of Document AI. However, the scenarios of past benchmarks are limited, and the corresponding evaluation protocols usually focus on the submodules of the structured text extraction scheme. In order to eliminate these problems, we organized the ICDAR 2023 competition on Structured text extraction from Visually-Rich Document images (SVRD). We set up two tracks for SVRD including Track 1: HUST-CELL and Track 2: Baidu-FEST, where HUST-CELL aims to evaluate the end-to-end performance of Complex Entity Linking and Labeling, and Baidu-FEST focuses on evaluating the performance and generalization of Zero-shot / Few-shot Structured Text extraction from an end-to-end perspective. Compared to the current document benchmarks, our two tracks of competition benchmark enriches the scenarios greatly and contains more than 50 types of visually-rich document images (mainly from the actual enterprise applications). The competition opened on 30th December, 2022 and closed on 24th March, 2023. There are 35 participants and 91 valid submissions received for Track 1, and 15 participants and 26 valid submissions received for Track 2. In this report we will presents the motivation, competition datasets, task definition, evaluation protocol, and submission summaries. According to the performance of the submissions, we believe there is still a large gap on the expected information extraction performance for complex and zero-shot scenarios. It is hoped that this competition will attract many researchers in the field of CV and NLP, and bring some new thoughts to the field of Document AI.

updated: Mon Jun 05 2023 22:20:52 GMT+0000 (UTC)

published: Mon Jun 05 2023 22:20:52 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト