Unified Questioner Transformer for Descriptive Question Generation in Goal-Oriented Visual Dialogue

Shoya Matsumori; Kosuke Shingyouchi; Yuki Abe; Yosuke Fukuchi; Komei Sugiura; Michita Imai

目標指向の視覚的対話における記述的質問生成のための統一された質問者変換器

現実の世界について質問できるインタラクティブな人工知能を構築することは、視覚と言語の問題に対する最大の課題の1つです。特に、エージェントの目的が話者交替対話中に質問することによって情報を探すことである目標指向の視覚的対話は、最近、学術的な注目を集めている。 GuessWhatに基づいたいくつかの既存のモデルがありますか？！データセットが提案されている場合、質問者は通常、単純なカテゴリベースの質問または絶対的な空間質問を行います。これは、オブジェクトが属性を共有する複雑なシーンや、オブジェクトを区別するために説明的な質問が必要な場合に問題になる可能性があります。この論文では、参照式を使用した記述的質問生成のために、Unified Questioner Transformer（UniQer）と呼ばれる新しいQuestionerアーキテクチャを提案します。さらに、CLEVRAskと呼ばれる目標指向の視覚的対話タスクを構築します。質問者が説明的な質問を生成する必要がある複雑なシーンを合成します。 CLEVRAskデータセットの2つのバリアントを使用してモデルをトレーニングします。定量的および定性的評価の結果は、UniQerがベースラインを上回っていることを示しています。

Building an interactive artificial intelligence that can ask questions about the real world is one of the biggest challenges for vision and language problems. In particular, goal-oriented visual dialogue, where the aim of the agent is to seek information by asking questions during a turn-taking dialogue, has been gaining scholarly attention recently. While several existing models based on the GuessWhat?! dataset have been proposed, the Questioner typically asks simple category-based questions or absolute spatial questions. This might be problematic for complex scenes where the objects share attributes or in cases where descriptive questions are required to distinguish objects. In this paper, we propose a novel Questioner architecture, called Unified Questioner Transformer (UniQer), for descriptive question generation with referring expressions. In addition, we build a goal-oriented visual dialogue task called CLEVR Ask. It synthesizes complex scenes that require the Questioner to generate descriptive questions. We train our model with two variants of CLEVR Ask datasets. The results of the quantitative and qualitative evaluations show that UniQer outperforms the baseline.

updated: Tue Jun 29 2021 16:36:34 GMT+0000 (UTC)

published: Tue Jun 29 2021 16:36:34 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト