COSMic: A Coherence-Aware Generation Metric for Image Descriptions

Mert İnan; Piyush Sharma; Baber Khalid; Radu Soricut; Matthew Stone; Malihe Alikhani

COSMic：画像記述のためのコヒーレンス認識生成メトリック

テキスト生成モデルの開発者は、時間と費用のかかる手動評価の代用として、自動評価メトリックに依存しています。ただし、画像のキャプションメトリックは、出力テキストの意味論的および語用論的成功の正確な学習済み推定値を提供するのに苦労しています。画像記述を評価するための最初の談話認識学習生成メトリックを導入することにより、この弱点に対処します。私たちのアプローチは、コヒーレンスを使用して情報目標をキャプチャするための談話の計算理論に触発されています。コヒーレンス関係で注釈が付けられたimagex2013descriptionペアのデータセットを提示します。次に、Conceptual Captionsデータセットのサブセットでコヒーレンス対応メトリックをトレーニングし、ドメイン外の画像で構成されるテストセットで出力キャプションx2014の人間による評価を予測するその有効性x2014its能力を測定します。 BLEURTやBERTScore。

Developers of text generation models rely on automated evaluation metrics as a stand-in for slow and expensive manual evaluations. However, image captioning metrics have struggled to give accurate learned estimates of the semantic and pragmatic success of output text. We address this weakness by introducing the first discourse-aware learned generation metric for evaluating image descriptions. Our approach is inspired by computational theories of discourse for capturing information goals using coherence. We present a dataset of imagex2013description pairs annotated with coherence relations. We then train a coherence-aware metric on a subset of the Conceptual Captions dataset and measure its effectivenessx2014its ability to predict human ratings of output captionsx2014on a test set composed of out-of-domain images. We demonstrate a higher Kendall Correlation Coefficient for our proposed metric with the human judgments for the results of a number of state-of-the-art coherence-aware caption generation models when compared to several other metrics including recently proposed learned metrics such as BLEURT and BERTScore.

updated: Sat Sep 11 2021 13:43:36 GMT+0000 (UTC)

published: Sat Sep 11 2021 13:43:36 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト