DATa: Domain Adaptation-Aided Deep Table Detection Using Visual-Lexical Representations

Hyebin Kwon; Joungbin An; Dongwoo Lee; Won-Yong Shin

DATa: 視覚語彙表現を使用したドメイン適応支援ディープテーブル検出

手作りのヒューリスティックに依存するルールベースのアプローチだけでなく、ディープラーニングのアプローチも開発することにより、テーブル検出にかなりの研究注目が払われてきました。最近の研究では、強化された結果でテーブル検出が正常に実行されますが、テーブルレイアウト機能が、基になるモデルがトレーニングされたソースドメインと異なる可能性がある転送ドメインに使用されると、パフォーマンスが低下することがよくあります。この問題を克服するために、信頼できるラベルがほとんどない特定のターゲットドメインで満足のいくパフォーマンスを保証する、新しいドメイン適応支援ディープテーブル検出方法である DATa を紹介します。この目的のために、語彙特徴と再トレーニングに使用される拡張モデルを新たに設計します。より具体的には、最先端のビジョンベースモデルの 1 つをバックボーンネットワークとして事前トレーニングした後、ビジョンベースモデルと多層パーセプトロン (MLP) アーキテクチャで構成される拡張モデルを再トレーニングします。トレーニング済みの MLP アーキテクチャに基づいて取得した新しい信頼スコアと、バウンディングボックスとその信頼スコアの初期予測を使用して、各信頼スコアをより正確に計算します。 DATAa の優位性を検証するために、ソースドメインに実世界のベンチマークデータセットを採用し、材料科学の記事で構成されるターゲットドメインに別のデータセットを採用して、実験的評価を行います。実験結果は、提案された DATAa メソッドが、ターゲットドメインの視覚的表現のみを利用する競合するメソッドよりも大幅に優れていることを示しています。このようなゲインは、信頼スコアのしきい値の設定に従って、高い偽陽性または偽陰性を排除する機能により可能になります。

Considerable research attention has been paid to table detection by developing not only rule-based approaches reliant on hand-crafted heuristics but also deep learning approaches. Although recent studies successfully perform table detection with enhanced results, they often experience performance degradation when they are used for transferred domains whose table layout features might differ from the source domain in which the underlying model has been trained. To overcome this problem, we present DATa, a novel Domain Adaptation-aided deep Table detection method that guarantees satisfactory performance in a specific target domain where few trusted labels are available. To this end, we newly design lexical features and an augmented model used for re-training. More specifically, after pre-training one of state-of-the-art vision-based models as our backbone network, we re-train our augmented model, consisting of the vision-based model and the multilayer perceptron (MLP) architecture. Using new confidence scores acquired based on the trained MLP architecture as well as an initial prediction of bounding boxes and their confidence scores, we calculate each confidence score more accurately. To validate the superiority of DATa, we perform experimental evaluations by adopting a real-world benchmark dataset in a source domain and another dataset in our target domain consisting of materials science articles. Experimental results demonstrate that the proposed DATa method substantially outperforms competing methods that only utilize visual representations in the target domain. Such gains are possible owing to the capability of eliminating high false positives or false negatives according to the setting of a confidence score threshold.

updated: Sat Nov 12 2022 12:14:16 GMT+0000 (UTC)

published: Sat Nov 12 2022 12:14:16 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト