Vision Meets Definitions: Unsupervised Visual Word Sense Disambiguation Incorporating Gloss Information

Sunjae Kwon; Rishabh Garodia; Minhwa Lee; Zhichao Yang; Hong Yu

ビジョンと定義の一致: 光沢情報を組み込んだ教師なしの視覚的な言葉の意味の曖昧さの解消

Visual Word Sense Disambiguation (VWSD) は、特定のコンテキストのターゲットワードの正しい意味を最も正確に表す画像を見つけるタスクです。以前は、画像とテキストのマッチングモデルは多義語の認識にしばしば悩まされていました。この論文では、外部の語彙知識ベースのグロス情報、特に意味定義を使用する教師なし VWSD アプローチを紹介します。具体的には、回答の意味情報が提供されていない場合は、ベイジアン推論を使用して意味定義を組み込むことをお勧めします。さらに、辞書外 (OOD) の問題を改善するために、GPT-3 を使用したコンテキストを意識した定義生成を提案します。実験結果は、ベイジアン推論ベースのアプローチにより、VWSD のパフォーマンスが大幅に向上したことを示しています。さらに、コンテキストを意識した定義生成は、既存の定義生成方法よりも優れたパフォーマンスを示す OOD の例で顕著なパフォーマンスの向上を達成しました。ソースコードはできるだけ早く公開します。

Visual Word Sense Disambiguation (VWSD) is a task to find the image that most accurately depicts the correct sense of the target word for the given context. Previously, image-text matching models often suffered from recognizing polysemous words. This paper introduces an unsupervised VWSD approach that uses gloss information of an external lexical knowledge-base, especially the sense definitions. Specifically, we suggest employing Bayesian inference to incorporate the sense definitions when sense information of the answer is not provided. In addition, to ameliorate the out-of-dictionary (OOD) issue, we propose a context-aware definition generation with GPT-3. Experimental results show that the VWSD performance significantly increased with our Bayesian inference-based approach. In addition, our context-aware definition generation achieved prominent performance improvement in OOD examples exhibiting better performance than the existing definition generation method. We will publish source codes as soon as possible.

updated: Tue May 02 2023 21:33:10 GMT+0000 (UTC)

published: Tue May 02 2023 21:33:10 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト