Performance Enhancement Leveraging Mask-RCNN on Bengali Document Layout Analysis

Shrestha Datta; Md Adith Mollah; Raisa Fairooz; Tariful Islam Fahim

ベンガル語ドキュメントレイアウト分析での Mask-RCNN を活用したパフォーマンスの向上

デジタル文書、特に歴史的な文書を理解することは、パズルを解くことに似ています。ドキュメントレイアウト分析 (DLA) は、ドキュメントを段落、画像、表などのセクションに分割することで、このパズルを解決します。これは、機械がこれらの文書を読んで理解するために非常に重要です。DL Sprint 2.0 コンテストでは、私たちはバングラ文書の理解に取り組みました。 BaDLAD というデータセットと多くの例を使用しました。この理解を助けるために、Mask R-CNN と呼ばれる特別なモデルをトレーニングしました。段階的にハイパーパラメータを調整することでこのモデルを改善し、0.889 という良好なサイコロスコアを達成しました。しかし、すべてが完璧に進んだわけではありません。英語のドキュメント用にトレーニングされたモデルを使用してみましたが、バングラ語にはうまく適合しませんでした。このことから、各言語には独自の課題があることがわかりました。 DL Sprint 2.0 用のソリューションは、ノートブック、重み、推論ノートブックとともに https://www.kaggle.com/competitions/dlsprint2/Discussion/432201 で公開されています。

Understanding digital documents is like solving a puzzle, especially historical ones. Document Layout Analysis (DLA) helps with this puzzle by dividing documents into sections like paragraphs, images, and tables. This is crucial for machines to read and understand these documents.In the DL Sprint 2.0 competition, we worked on understanding Bangla documents. We used a dataset called BaDLAD with lots of examples. We trained a special model called Mask R-CNN to help with this understanding. We made this model better by step-by-step hyperparameter tuning, and we achieved a good dice score of 0.889.However, not everything went perfectly. We tried using a model trained for English documents, but it didn't fit well with Bangla. This showed us that each language has its own challenges. Our solution for the DL Sprint 2.0 is publicly available at https://www.kaggle.com/competitions/dlsprint2/discussion/432201 along with notebooks, weights, and inference notebook.

updated: Mon Aug 21 2023 06:51:58 GMT+0000 (UTC)

published: Mon Aug 21 2023 06:51:58 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト