IMPROVE Visiolinguistic Performance with Re-Query

Stephan J. Lemmer; Jason J. Corso

再クエリを使用してVisiolinguisticのパフォーマンスを向上させる

私たち人間は、視覚の世界について話し合うときに混乱した場合、定期的に説明を求めますが、視覚ダイアログ、VQA、参照式の理解などの視覚言語の問題で一般的な要件は、単一の静的な言語入力に基づいて決定を強制することです。この仮定は人間の慣習と一致しないため、それを緩和し、モデルが新しい言語入力を要求してタスクの予測を改善できるようにします。式の理解を参照するという模範的なタスクを通じて、問題を形式化して動機付け、評価方法を導入し、言語式の再クエリの確率の反復乗算（IMPROVE）を提案します。これは、モデルの予測を更新する再クエリ方法です。複数のクエリ。 2つの異なる参照式理解モデルでIMPROVEを示し、モデルのアーキテクチャに追加のトレーニングや変更を加えることなく、精度を最大6.23％向上できることを示します。

We humans regularly ask for clarification if we are confused when discussing the visual world, yet the commonplace requirement in visiolinguistic problems like Visual Dialog, VQA, and Referring Expression Comprehension is to force a decision based on a single, static language input. Since this assumption does not match human practice, we relax it and allow our model to request new language inputs to refine the prediction for a task. Through the exemplar task of referring expression comprehension, we formalize and motivate the problem, introduce an evaluation method, and propose Iterative Multiplication of Probabilities for Re-query Of Verbal Expressions (IMPROVE) -- a re-query method that updates the model's prediction across multiple queries. We demonstrate IMPROVE on two different referring expression comprehension models and show it can improve accuracy by up to 6.23% without additional training or modification to the model's architecture.

updated: Thu May 26 2022 18:50:30 GMT+0000 (UTC)

published: Tue Oct 19 2021 19:01:30 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト