ImaginE: An Imagination-Based Automatic Evaluation Metric for Natural Language Generation

Wanrong Zhu; Xin Eric Wang; An Yan; Miguel Eckstein; William Yang Wang

ImaginE：自然言語生成のための想像力ベースの自動評価メトリック

自然言語生成（NLG）の自動評価は、従来、テキスト参照とのトークンレベルまたは埋め込みレベルの比較に依存しています。これは、視覚的な想像力が理解を向上させることが多い人間の言語処理とは異なります。この作業では、自然言語生成のための想像力に基づく自動評価メトリックであるImaginEを提案します。大規模な画像とテキストのペアで事前トレーニングされた2つのクロスモーダルモデルであるCLIPとDALL-Eを使用して、テキストスニペットの具体化された想像力として画像を自動的に生成し、コンテキスト埋め込みを使用して想像力の類似性を計算します。いくつかのテキスト生成タスクにまたがる実験は、ImaginEで想像力を追加することで、マルチモーダル情報をNLG評価に導入する大きな可能性を示し、多くの状況で既存の自動メトリックと人間の類似性判断との相関を改善することを示しています。

Automatic evaluations for natural language generation (NLG) conventionally rely on token-level or embedding-level comparisons with the text references. This is different from human language processing, for which visual imaginations often improve comprehension. In this work, we propose ImaginE, an imagination-based automatic evaluation metric for natural language generation. With the help of CLIP and DALL-E, two cross-modal models pre-trained on large-scale image-text pairs, we automatically generate an image as the embodied imagination for the text snippet and compute the imagination similarity using contextual embeddings. Experiments spanning several text generation tasks demonstrate that adding imagination with our ImaginE displays great potential in introducing multi-modal information into NLG evaluation, and improves existing automatic metrics' correlations with human similarity judgments in many circumstances.

updated: Thu Jun 10 2021 17:59:52 GMT+0000 (UTC)

published: Thu Jun 10 2021 17:59:52 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト