QACE: Asking Questions to Evaluate an Image Caption

Hwanhee Lee; Thomas Scialom; Seunghyun Yoon; Franck Dernoncourt; Kyomin Jung

QACE：画像キャプションを評価するための質問をする

この論文では、キャプション評価のための質問応答に基づく新しいメトリックであるQACEを提案します。 QACEは、評価されたキャプションについて質問を生成し、参照キャプションまたはソース画像のいずれかで質問することによってその内容を確認します。まず、評価されたキャプションの回答をその参照と比較するQACE-Refを開発し、最新のメトリックを使用して競争力のある結果を報告します。さらに進むために、参照ではなく画像上で直接質問するQACE-Imgを提案します。 QACE-ImgにはVisual-QAシステムが必要です。残念ながら、標準のVQAモデルは、わずか数千のカテゴリ間の分類として構成されています。代わりに、抽象的なVQAシステムであるVisual-T5を提案します。結果のメトリックであるQACE-Imgは、マルチモーダルで、参照がなく、説明可能です。私たちの実験は、QACE-Imgが他の参照のないメトリックと比べて遜色がないことを示しています。 QACEを計算するための事前トレーニング済みモデルをリリースします。

In this paper, we propose QACE, a new metric based on Question Answering for Caption Evaluation. QACE generates questions on the evaluated caption and checks its content by asking the questions on either the reference caption or the source image. We first develop QACE-Ref that compares the answers of the evaluated caption to its reference, and report competitive results with the state-of-the-art metrics. To go further, we propose QACE-Img, which asks the questions directly on the image, instead of reference. A Visual-QA system is necessary for QACE-Img. Unfortunately, the standard VQA models are framed as a classification among only a few thousand categories. Instead, we propose Visual-T5, an abstractive VQA system. The resulting metric, QACE-Img is multi-modal, reference-less, and explainable. Our experiments show that QACE-Img compares favorably w.r.t. other reference-less metrics. We will release the pre-trained models to compute QACE.

updated: Sat Aug 28 2021 03:04:28 GMT+0000 (UTC)

published: Sat Aug 28 2021 03:04:28 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト