e-SNLI-VE: Corrected Visual-Textual Entailment with Natural Language Explanations

Virginie Do; Oana-Maria Camburu; Zeynep Akata; Thomas Lukasiewicz

e-SNLI-VE：自然言語の説明を含む修正されたビジュアルテキスト含意

視覚的テキスト含意を認識するために最近提案されたSNLI-VEコーパスは、きめ細かいマルチモーダル推論のための大規模な実世界のデータセットです。ただし、SNLI-VEが（2つの関連するデータセットの一部を組み合わせて）自動的にアセンブルされる方法では、このコーパスのラベルに多数のエラーが発生します。このホワイトペーパーでは、最初に、SNLI-VEでエラー率が最も高いクラスを修正するためのデータ収集の取り組みについて説明します。次に、SNLI-VE-2.0と呼ばれる、修正されたコーパスの既存のモデルを再評価し、修正されていないコーパスでのパフォーマンスとの定量的な比較を提供します。第三に、SNLI-VE-2.0に人間が書いた自然言語の説明を追加するe-SNLI-VEを紹介します。最後に、トレーニング時にこれらの説明から学習したモデルをトレーニングし、テスト時にそのような説明を出力します。

The recently proposed SNLI-VE corpus for recognising visual-textual entailment is a large, real-world dataset for fine-grained multimodal reasoning. However, the automatic way in which SNLI-VE has been assembled (via combining parts of two related datasets) gives rise to a large number of errors in the labels of this corpus. In this paper, we first present a data collection effort to correct the class with the highest error rate in SNLI-VE. Secondly, we re-evaluate an existing model on the corrected corpus, which we call SNLI-VE-2.0, and provide a quantitative comparison with its performance on the non-corrected corpus. Thirdly, we introduce e-SNLI-VE, which appends human-written natural language explanations to SNLI-VE-2.0. Finally, we train models that learn from these explanations at training time, and output such explanations at testing time.

updated: Thu Aug 19 2021 09:26:21 GMT+0000 (UTC)

published: Tue Apr 07 2020 23:12:51 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト