ConvGenVisMo: Evaluation of Conversational Generative Vision Models

Narjes Nikzad Khasmakhi; Meysam Asgari-Chenaghlu; Nabiha Asghar; Philipp Schaer; Dietlind Zühlke

ConvGenVisMo: 会話型生成ビジョンモデルの評価

Visual ChatGPT (Wu et al., 2023) のような会話型生成ビジョンモデル (CGVM) は、最近、コンピュータービジョンと自然言語処理技術の合成から出現しました。これらのモデルは、ユーザーからの口頭入力を理解し、視覚的な出力とともに自然言語で応答を生成できるため、人間と機械の間のより自然でインタラクティブなコミュニケーションを可能にします。これらのモデルの使用と展開について情報に基づいた意思決定を行うには、現実的なデータセットに対する適切な評価フレームワークを通じてモデルのパフォーマンスを分析することが重要です。この論文では、CGVM を評価するという新しいタスクのためのフレームワークである ConvGenVisMo を紹介します。 ConvGenVisMo は、このタスク用の新しいベンチマーク評価データセットを導入し、出力を評価するための既存および新しい自動評価メトリクスのスイートも提供します。データセットと評価コードを含むすべての ConvGenVisMo アセットは、GitHub で公開されます。

Conversational generative vision models (CGVMs) like Visual ChatGPT (Wu et al., 2023) have recently emerged from the synthesis of computer vision and natural language processing techniques. These models enable more natural and interactive communication between humans and machines, because they can understand verbal inputs from users and generate responses in natural language along with visual outputs. To make informed decisions about the usage and deployment of these models, it is important to analyze their performance through a suitable evaluation framework on realistic datasets. In this paper, we present ConvGenVisMo, a framework for the novel task of evaluating CGVMs. ConvGenVisMo introduces a new benchmark evaluation dataset for this task, and also provides a suite of existing and new automated evaluation metrics to evaluate the outputs. All ConvGenVisMo assets, including the dataset and the evaluation code, will be made available publicly on GitHub.

updated: Sun May 28 2023 17:59:26 GMT+0000 (UTC)

published: Sun May 28 2023 17:59:26 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト