Towards Few-shot Entity Recognition in Document Images: A Graph Neural Network Approach Robust to Image Manipulation

Prashant Krishnan; Zilong Wang; Yangkun Wang; Jingbo Shang

ドキュメント画像内の少数ショットのエンティティ認識に向けて: 画像操作に強いグラフニューラルネットワークアプローチ

レイアウト情報 (通常はバウンディングボックスの座標) を事前トレーニングされた言語モデルに組み込む最近の進歩により、文書画像からのエンティティ認識において大幅なパフォーマンスが達成されました。座標を使用すると、各トークンの絶対位置を簡単にモデル化できますが、特にトレーニングデータが数ショット設定に制限されている場合、ドキュメント画像の操作 (シフト、回転、拡大縮小など) の影響を受けやすい可能性があります。この論文では、トークン間の位相的隣接関係をさらに導入し、それらの相対的な位置情報を強調することを提案します。具体的には、ドキュメント内のトークンをノードと見なし、k 近傍境界ボックスからの位相ヒューリスティックに基づいてエッジを定式化します。このような隣接グラフは、シフト、回転、スケーリングなどのアフィン変換に対して不変です。言語モデルの埋め込みの上にグラフニューラルネットワーク層を追加することで、これらのグラフを事前トレーニングされた言語モデルに組み込み、新しいモデル LAGER を導き出します。 2 つのベンチマークデータセットに対する広範な実験により、LAGER が異なる少数ショット設定の下で強力なベースラインを大幅に上回るパフォーマンスを示し、操作に対する堅牢性が向上していることも実証されました。

Recent advances of incorporating layout information, typically bounding box coordinates, into pre-trained language models have achieved significant performance in entity recognition from document images. Using coordinates can easily model the absolute position of each token, but they might be sensitive to manipulations in document images (e.g., shifting, rotation or scaling), especially when the training data is limited in few-shot settings. In this paper, we propose to further introduce the topological adjacency relationship among the tokens, emphasizing their relative position information. Specifically, we consider the tokens in the documents as nodes and formulate the edges based on the topological heuristics from the k-nearest bounding boxes. Such adjacency graphs are invariant to affine transformations including shifting, rotations and scaling. We incorporate these graphs into the pre-trained language model by adding graph neural network layers on top of the language model embeddings, leading to a novel model LAGER. Extensive experiments on two benchmark datasets show that LAGER significantly outperforms strong baselines under different few-shot settings and also demonstrate better robustness to manipulations.

updated: Fri Feb 23 2024 05:36:02 GMT+0000 (UTC)

published: Wed May 24 2023 07:34:33 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト