arXiv reaDer
ViBERTgrid BiLSTM-CRF: Multimodal Key Information Extraction from Unstructured Financial Documents
Multimodal key information extraction (KIE) models have been studied extensively on semi-structured documents. However, their investigation on unstructured documents is an emerging research topic. The paper presents an approach to adapt a multimodal transformer (i.e., ViBERTgrid previously explored on semi-structured documents) for unstructured financial documents, by incorporating a BiLSTM-CRF layer. The proposed ViBERTgrid BiLSTM-CRF model demonstrates a significant improvement in performance (up to 2 percentage points) on named entity recognition from unstructured documents in financial domain, while maintaining its KIE performance on semi-structured documents. As an additional contribution, we publicly released token-level annotations for the SROIE dataset in order to pave the way for its use in multimodal sequence labeling models.
updated: Mon Sep 23 2024 13:28:06 GMT+0000 (UTC)
published: Mon Sep 23 2024 13:28:06 GMT+0000 (UTC)
参考文献 (このサイトで利用可能なもの) / References (only if available on this site)
被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)
Amazon.co.jpアソシエイト