e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks

Maxime Kayser; Oana-Maria Camburu; Leonard Salewski; Cornelius Emde; Virginie Do; Zeynep Akata; Thomas Lukasiewicz

e-ViL：視覚言語タスクにおける自然言語説明のデータセットとベンチマーク

最近、ますます多くの作品が、視覚言語（VL）タスクの予測のために自然言語説明（NLE）を生成できるモデルを導入しています。このようなモデルは、人間にわかりやすく包括的な説明を提供できるため、魅力的です。しかし、これらのモデルによって生成された説明のための統一された評価アプローチはまだ不足しています。さらに、現在、VLタスク用のNLEのデータセットはほとんどありません。この作業では、統一された評価フレームワークを確立し、VLタスクのNLEを生成する既存のアプローチの最初の包括的な比較を提供する説明可能な視覚言語タスクのベンチマークであるe-ViLを紹介します。 e-ViLは、4つのモデルと3つのデータセットにまたがっています。モデルで生成された説明を評価するために、自動メトリックと人間による評価の両方が使用されます。また、NLE（43万以上のインスタンス）を備えた既存の最大のVLデータセットであるe-SNLI-VEも紹介します。最後に、画像とテキストの共同埋め込みを学習するUNITERと、テキスト生成に適した事前トレーニング済みの言語モデルであるGPT-2を組み合わせた新しいモデルを提案します。これは、すべてのデータセットで以前の最先端技術を大幅に上回っています。

Recently, an increasing number of works have introduced models capable of generating natural language explanations (NLEs) for their predictions on vision-language (VL) tasks. Such models are appealing because they can provide human-friendly and comprehensive explanations. However, there is still a lack of unified evaluation approaches for the explanations generated by these models. Moreover, there are currently only few datasets of NLEs for VL tasks. In this work, we introduce e-ViL, a benchmark for explainable vision-language tasks that establishes a unified evaluation framework and provides the first comprehensive comparison of existing approaches that generate NLEs for VL tasks. e-ViL spans four models and three datasets. Both automatic metrics and human evaluation are used to assess model-generated explanations. We also introduce e-SNLI-VE, the largest existing VL dataset with NLEs (over 430k instances). Finally, we propose a new model that combines UNITER, which learns joint embeddings of images and text, and GPT-2, a pre-trained language model that is well-suited for text generation. It surpasses the previous state-of-the-art by a large margin across all datasets.

updated: Sat May 08 2021 18:46:33 GMT+0000 (UTC)

published: Sat May 08 2021 18:46:33 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト