Handwritten Mathematical Expression Recognition with Bidirectionally Trained Transformer

Wenqi Zhao; Liangcai Gao; Zuoyu Yan; Shuai Peng; Lin Du; Ziyin Zhang

双方向に訓練されたトランスフォーマーによる手書きの数式認識

エンコーダー-デコーダーモデルは、最近、手書きの数式認識で大きな進歩を遂げました。ただし、既存の方法では、画像の特徴に正確に注意を向けることが依然として課題です。さらに、これらのエンコーダー-デコーダーモデルは通常、デコーダー部分にRNNベースのモデルを採用しているため、長い$$シーケンスの処理が非効率になります。この論文では、トランスベースのデコーダーを使用してRNNベースのデコーダーを置き換えているため、モデルアーキテクチャ全体が非常に簡潔になっています。さらに、双方向言語モデリングでトランスフォーマーの可能性を十分に活用するために、新しいトレーニング戦略が導入されています。データ拡張を使用しないいくつかの方法と比較して、実験は、私たちのモデルがCROHME 2014の現在の最先端の方法のExpRateを2.23％改善することを示しています。同様に、CROHME2016とCROHME2019では、ExpRateがそれぞれ1.92％と2.28％向上しています。

Encoder-decoder models have made great progress on handwritten mathematical expression recognition recently. However, it is still a challenge for existing methods to assign attention to image features accurately. Moreover, those encoder-decoder models usually adopt RNN-based models in their decoder part, which makes them inefficient in processing long $$ sequences. In this paper, a transformer-based decoder is employed to replace RNN-based ones, which makes the whole model architecture very concise. Furthermore, a novel training strategy is introduced to fully exploit the potential of the transformer in bidirectional language modeling. Compared to several methods that do not use data augmentation, experiments demonstrate that our model improves the ExpRate of current state-of-the-art methods on CROHME 2014 by 2.23%. Similarly, on CROHME 2016 and CROHME 2019, we improve the ExpRate by 1.92% and 2.28% respectively.

updated: Sun May 16 2021 08:47:18 GMT+0000 (UTC)

published: Thu May 06 2021 03:11:54 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト