CoMER: Modeling Coverage for Transformer-based Handwritten Mathematical Expression Recognition

Wenqi Zhao; Liangcai Gao

CoMER：トランスフォーマーベースの手書き数式認識のカバレッジのモデリング

Transformerベースのエンコーダー/デコーダーアーキテクチャは、最近、手書きの数式の認識において大きな進歩を遂げました。ただし、トランスフォーマーモデルは依然としてカバレッジの欠如の問題に悩まされており、その式認識率（ExpRate）はRNNの対応するモデルよりも劣っています。過去のステップのアライメント情報を記録するカバレッジ情報は、RNNモデルで効果的であることが証明されています。本論文では、トランスデコーダのカバレッジ情報を採用したモデルであるCoMERを提案する。具体的には、並列処理を損なうことなく、過去のアライメント情報を使用して注意の重みを調整するための新しい注意調整モジュール（ARM）を提案します。さらに、現在および前のレイヤーからの過去のアライメント情報を利用するセルフカバレッジとクロスカバレッジを提案することにより、カバレッジ情報を極限まで高めます。実験によると、CoMERは現在の最先端モデルと比較してExpRateを0.61％/ 2.09％/ 1.59％改善し、CROHME 2014/2016/2019テストセットで59.33％/ 59.81％/ 62.97％に達します。

The Transformer-based encoder-decoder architecture has recently made significant advances in recognizing handwritten mathematical expressions. However, the transformer model still suffers from the lack of coverage problem, making its expression recognition rate (ExpRate) inferior to its RNN counterpart. Coverage information, which records the alignment information of the past steps, has proven effective in the RNN models. In this paper, we propose CoMER, a model that adopts the coverage information in the transformer decoder. Specifically, we propose a novel Attention Refinement Module (ARM) to refine the attention weights with past alignment information without hurting its parallelism. Furthermore, we take coverage information to the extreme by proposing self-coverage and cross-coverage, which utilize the past alignment information from the current and previous layers. Experiments show that CoMER improves the ExpRate by 0.61%/2.09%/1.59% compared to the current state-of-the-art model, and reaches 59.33%/59.81%/62.97% on the CROHME 2014/2016/2019 test sets.

updated: Sun Jul 10 2022 07:59:23 GMT+0000 (UTC)

published: Sun Jul 10 2022 07:59:23 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト