SciCap+: A Knowledge Augmented Dataset to Study the Challenges of Scientific Figure Captioning

Zhishen Yang; Raj Dabre; Hideki Tanaka; Naoaki Okazaki

SciCap+: 科学図のキャプションの課題を研究するための知識拡張データセット

学術文書では、図は科学的発見を読者に伝える簡単な方法を提供します。図のキャプション生成を自動化すると、テキストを超えて科学文書のモデル理解を促進し、著者が科学的知見の伝達を容易にする有益なキャプションを作成できるようになります。以前の研究とは異なり、我々は科学図のキャプションを、モデルがキャプション生成のためにモダリティ全体に埋め込まれた知識を利用する必要がある知識拡張画像キャプションタスクとして再構成しました。この目的を達成するために、私たちは大規模な SciCap データセット~hsu-etal-2021-scicap-generate を SciCap+ に拡張しました。これには、言及段落 (図について言及している段落) と OCR トークンが含まれています。次に、研究のベースラインとして M4C-Captioner (ポインターネットワークを備えたマルチモーダルトランスベースモデル) を使用して実験を実行します。私たちの結果は、言及段落が追加の文脈知識として機能し、図のみのベースラインと比較して自動標準画像キャプション評価スコアを大幅に向上させることを示しています。人間による評価では、読者にとって有益な図のキャプションを作成する際の課題がさらに明らかになります。コードと SciCap+ データセットは https://github.com/ZhishenYang/scientific_figure_captioning_dataset で公開されます。

In scholarly documents, figures provide a straightforward way of communicating scientific findings to readers. Automating figure caption generation helps move model understandings of scientific documents beyond text and will help authors write informative captions that facilitate communicating scientific findings. Unlike previous studies, we reframe scientific figure captioning as a knowledge-augmented image captioning task that models need to utilize knowledge embedded across modalities for caption generation. To this end, we extended the large-scale SciCap dataset~hsu-etal-2021-scicap-generating to SciCap+ which includes mention-paragraphs (paragraphs mentioning figures) and OCR tokens. Then, we conduct experiments with the M4C-Captioner (a multimodal transformer-based model with a pointer network) as a baseline for our study. Our results indicate that mention-paragraphs serves as additional context knowledge, which significantly boosts the automatic standard image caption evaluation scores compared to the figure-only baselines. Human evaluations further reveal the challenges of generating figure captions that are informative to readers. The code and SciCap+ dataset will be publicly available at https://github.com/ZhishenYang/scientific_figure_captioning_dataset

updated: Tue Jun 06 2023 08:16:16 GMT+0000 (UTC)

published: Tue Jun 06 2023 08:16:16 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト