Ensemble of MRR and NDCG models for Visual Dialog

Idan Schwartz

ビジュアルダイアログ用のMRRおよびNDCGモデルのアンサンブル

人間の言語で会話し、視覚的なコンテンツを理解できるAIエージェントを評価することは困難です。 BLEUスコアなどの生成メトリックは、セマンティクスよりも正しい構文を優先します。したがって、エージェントが候補オプションのセットをランク付けする識別アプローチがよく使用されます。平均相互ランク（MRR）メトリックは、単一の人間由来の回答のランクを考慮してモデルのパフォーマンスを評価します。ただし、このアプローチでは、新しい課題が発生します。たとえば、意味の同等性（たとえば、「ええ」と「はい」）など、回答のあいまいさと同義語です。これに対処するために、正規化された割引累積ゲイン（NDCG）メトリックを使用して、密な注釈を介してすべての正解の関連性をキャプチャしました。ただし、NDCGメトリックは、「わからない」など、通常適用可能な不確実な回答を優先します。 MRRとNDCGの両方のメトリックに優れたモデルを作成することは困難です。理想的には、AIエージェントは人間のような応答に応答し、応答の正しさを検証する必要があります。この問題に対処するために、強力なMRRモデルとNDCGモデルをマージできる2段階のノンパラメトリックランキングアプローチについて説明します。私たちのアプローチを使用して、ほとんどのMRRの最先端のパフォーマンス（70.41％対71.24％）とNDCGの最先端のパフォーマンス（72.16％対75.35％）を維持することができます。さらに、私たちのアプローチは、最近のVisual Dialog2020チャレンジに勝ちました。ソースコードはhttps://github.com/idansc/mrr-ndcgで入手できます。

Assessing an AI agent that can converse in human language and understand visual content is challenging. Generation metrics, such as BLEU scores favor correct syntax over semantics. Hence a discriminative approach is often used, where an agent ranks a set of candidate options. The mean reciprocal rank (MRR) metric evaluates the model performance by taking into account the rank of a single human-derived answer. This approach, however, raises a new challenge: the ambiguity and synonymy of answers, for instance, semantic equivalence (e.g., `yeah' and `yes'). To address this, the normalized discounted cumulative gain (NDCG) metric has been used to capture the relevance of all the correct answers via dense annotations. However, the NDCG metric favors the usually applicable uncertain answers such as `I don't know. Crafting a model that excels on both MRR and NDCG metrics is challenging. Ideally, an AI agent should answer a human-like reply and validate the correctness of any answer. To address this issue, we describe a two-step non-parametric ranking approach that can merge strong MRR and NDCG models. Using our approach, we manage to keep most MRR state-of-the-art performance (70.41% vs. 71.24%) and the NDCG state-of-the-art performance (72.16% vs. 75.35%). Moreover, our approach won the recent Visual Dialog 2020 challenge. Source code is available at https://github.com/idansc/mrr-ndcg.

updated: Mon Jun 21 2021 16:52:11 GMT+0000 (UTC)

published: Thu Apr 15 2021 15:09:32 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト