Multi-modal Text Recognition Networks: Interactive Enhancements between Visual and Semantic Features

Byeonghu Na; Yoonsik Kim; Sungrae Park

マルチモーダルテキスト認識ネットワーク：視覚的機能と意味的機能の間のインタラクティブな拡張

言語知識は、文字シーケンスを洗練するためのセマンティクスを提供することにより、シーンのテキスト認識に大きなメリットをもたらしました。ただし、言語知識は出力シーケンスに個別に適用されているため、以前の方法では、テキスト認識の視覚的な手がかりを理解するためのセマンティクスを十分に活用していませんでした。この論文では、マルチモーダルテキスト認識ネットワーク（MATRN）と呼ばれる新しい方法を紹介します。これにより、視覚的機能と意味的機能の間の相互作用が可能になり、認識パフォーマンスが向上します。具体的には、MATRNは視覚的および意味的特徴のペアを識別し、空間情報を意味的特徴にエンコードします。空間エンコーディングに基づいて、他のモダリティの関連機能を参照することにより、視覚的および意味的機能が強化されます。さらに、MATRNは、トレーニングフェーズでキャラクターに関連する視覚的な手がかりを隠すことにより、意味的特徴を視覚的特徴に結合することを刺激します。私たちの実験は、MATRNが大きなマージンで7つのベンチマークで最先端のパフォーマンスを達成する一方で、2つのモダリティの素朴な組み合わせがわずかな改善を示すことを示しています。さらなる奪格研究は、提案されたコンポーネントの有効性を証明します。私たちの実装はhttps://github.com/wp03052/MATRNで公開されています。

Linguistic knowledge has brought great benefits to scene text recognition by providing semantics to refine character sequences. However, since linguistic knowledge has been applied individually on the output sequence, previous methods have not fully utilized the semantics to understand visual clues for text recognition. This paper introduces a novel method, called Multi-modAl Text Recognition Network (MATRN), that enables interactions between visual and semantic features for better recognition performances. Specifically, MATRN identifies visual and semantic feature pairs and encodes spatial information into semantic features. Based on the spatial encoding, visual and semantic features are enhanced by referring to related features in the other modality. Furthermore, MATRN stimulates combining semantic features into visual features by hiding visual clues related to the character in the training phase. Our experiments demonstrate that MATRN achieves state-of-the-art performances on seven benchmarks with large margins, while naive combinations of two modalities show marginal improvements. Further ablative studies prove the effectiveness of our proposed components. Our implementation is publicly available at https://github.com/wp03052/MATRN.

updated: Sat Jan 22 2022 13:01:48 GMT+0000 (UTC)

published: Tue Nov 30 2021 10:22:11 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト