Handwritten Mathematical Expression Recognition via Attention Aggregation based Bi-directional Mutual Learning

Xiaohang Bian; Bo Qin; Xiaozhe Xin; Jianwu Li; Xuefeng Su; Yanfeng Wang

注意集約ベースの双方向相互学習による手書き数式認識

手書きの数式認識は、指定された画像からLaTeXシーケンスを自動的に生成することを目的としています。現在、注意ベースのエンコーダ-デコーダモデルがこのタスクで広く使用されています。これらは通常、左から右（L2R）の方法でターゲットシーケンスを生成し、右から左（R2L）のコンテキストを活用しないままにします。本論文では、1つの共有エンコーダと2つの並列逆デコーダ（L2RとR2L）で構成される注意集約ベースの双方向相互学習ネットワーク（ABM）を提案します。 2つのデコーダーは、相互蒸留によって強化されます。相互蒸留では、各トレーニングステップで1対1の知識を伝達し、2つの逆方向からの補足情報を最大限に活用します。さらに、多様なスケールの数学記号を処理するために、注意集約モジュール（AAM）が提案され、マルチスケールのカバレッジ注意を効果的に統合します。特に、推論フェーズでは、モデルがすでに2つの逆方向から知識を学習しているため、元のパラメーターサイズと推論速度を維持しながら、推論にL2Rブランチのみを使用します。広範な実験により、提案されたアプローチは、データ拡張とモデルアンサンブルなしで、CROHME 2014で56.85％、CROHME 2016で52.92％、CROHME 2019で53.96％の認識精度を達成し、最先端の方法を大幅に上回っています。ソースコードはhttps://github.com/XH-B/ABMで入手できます。

Handwritten mathematical expression recognition aims to automatically generate LaTeX sequences from given images. Currently, attention-based encoder-decoder models are widely used in this task. They typically generate target sequences in a left-to-right (L2R) manner, leaving the right-to-left (R2L) contexts unexploited. In this paper, we propose an Attention aggregation based Bi-directional Mutual learning Network (ABM) which consists of one shared encoder and two parallel inverse decoders (L2R and R2L). The two decoders are enhanced via mutual distillation, which involves one-to-one knowledge transfer at each training step, making full use of the complementary information from two inverse directions. Moreover, in order to deal with mathematical symbols in diverse scales, an Attention Aggregation Module (AAM) is proposed to effectively integrate multi-scale coverage attentions. Notably, in the inference phase, given that the model already learns knowledge from two inverse directions, we only use the L2R branch for inference, keeping the original parameter size and inference speed. Extensive experiments demonstrate that our proposed approach achieves the recognition accuracy of 56.85 % on CROHME 2014, 52.92 % on CROHME 2016, and 53.96 % on CROHME 2019 without data augmentation and model ensembling, substantially outperforming the state-of-the-art methods. The source code is available in https://github.com/XH-B/ABM.

updated: Wed Feb 23 2022 08:30:21 GMT+0000 (UTC)

published: Tue Dec 07 2021 09:53:40 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト