Whether you can locate or not? Interactive Referring Expression Generation

Fulong Ye; Yuxing Long; Fangxiang Feng; Xiaojie Wang

見つけられるかどうか？インタラクティブな参照式の生成

参照式生成 (REG) は、ビジュアルシーン内のオブジェクトに対する明確な参照式 (RE) を生成することを目的とし、参照されるオブジェクトを見つけるための参照式理解 (REC) という二重のタスクを行います。既存の方法は、REG モデルと REC モデルの間の潜在的な相互作用を考慮せずに、モデルトレーニングのグラウンドトゥルースとして RE のみを使用して、REG モデルを独立して構築します。本稿では、物体が位置しているかどうかを示す信号と、RECモデルによって位置が特定されている視覚領域を利用してREを徐々に変更する、実際のRECモデルと対話できるInteractive REG（IREG）モデルを提案します。 3 つの RE ベンチマークデータセット、RefCOCO、RefCOCO+、および RefCOCOg に関する実験結果は、IREG が一般的な評価指標に関して以前の最先端の方法よりも優れていることを示しています。さらに、人による評価では、IREG が対話機能を備えたより優れた RE を生成することが示されています。

Referring Expression Generation (REG) aims to generate unambiguous Referring Expressions (REs) for objects in a visual scene, with a dual task of Referring Expression Comprehension (REC) to locate the referred object. Existing methods construct REG models independently by using only the REs as ground truth for model training, without considering the potential interaction between REG and REC models. In this paper, we propose an Interactive REG (IREG) model that can interact with a real REC model, utilizing signals indicating whether the object is located and the visual region located by the REC model to gradually modify REs. Our experimental results on three RE benchmark datasets, RefCOCO, RefCOCO+, and RefCOCOg show that IREG outperforms previous state-of-the-art methods on popular evaluation metrics. Furthermore, a human evaluation shows that IREG generates better REs with the capability of interaction.

updated: Sat Aug 19 2023 10:53:32 GMT+0000 (UTC)

published: Sat Aug 19 2023 10:53:32 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト