ST-KeyS: Self-Supervised Transformer for Keyword Spotting in Historical Handwritten Documents

Sana Khamekhem Jemni; Sourour Ammar; Mohamed Ali Souibgui; Yousri Kessentini; Abbas Cheddad

ST-KeyS: 歴史的な手書き文書のキーワードスポッティングのための自己管理型トランスフォーマー

歴史的文書のキーワードスポッティング (KWS) は、デジタル化されたコレクションを最初に探索するための重要なツールです。現在、最も効率的な KWS メソッドは、大量の注釈付きトレーニングデータを必要とする機械学習手法に依存しています。しかし、歴史的写本の場合、トレーニング用の注釈付きコーパスが不足しています。データ不足の問題に対処するために、自己教師あり学習のメリットを調査して、人間の注釈に依存せずに入力データの有用な表現を抽出し、これらの表現を下流のタスクで使用します。ラベル付きデータを必要とせずに、プレトレーニング段階がマスクと予測のパラダイムに基づいているビジョントランスフォーマーに基づくマスクされた自動エンコーダーモデルである ST-KeyS を提案します。微調整段階では、事前トレーニング済みのエンコーダーが、入力画像からの特徴の埋め込みを改善するために微調整されたシャムニューラルネットワークモデルに統合されます。文字のピラミッド型ヒストグラム (PHOC) 埋め込みを使用して画像表現をさらに改善し、テキスト属性に基づいて画像の中間表現を作成および活用します。広く使用されている 3 つのベンチマークデータセット (Botany、Alvermann Konzilsprotokolle、および George Washington) での徹底的な実験的評価では、提案されたアプローチは、同じデータセットでトレーニングされた最先端の方法よりも優れています。

Keyword spotting (KWS) in historical documents is an important tool for the initial exploration of digitized collections. Nowadays, the most efficient KWS methods are relying on machine learning techniques that require a large amount of annotated training data. However, in the case of historical manuscripts, there is a lack of annotated corpus for training. To handle the data scarcity issue, we investigate the merits of the self-supervised learning to extract useful representations of the input data without relying on human annotations and then using these representations in the downstream task. We propose ST-KeyS, a masked auto-encoder model based on vision transformers where the pretraining stage is based on the mask-and-predict paradigm, without the need of labeled data. In the fine-tuning stage, the pre-trained encoder is integrated into a siamese neural network model that is fine-tuned to improve feature embedding from the input images. We further improve the image representation using pyramidal histogram of characters (PHOC) embedding to create and exploit an intermediate representation of images based on text attributes. In an exhaustive experimental evaluation on three widely used benchmark datasets (Botany, Alvermann Konzilsprotokolle and George Washington), the proposed approach outperforms state-of-the-art methods trained on the same datasets.

updated: Mon Mar 06 2023 13:39:41 GMT+0000 (UTC)

published: Mon Mar 06 2023 13:39:41 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト