MMChat: Multi-Modal Chat Dataset on Social Media

Yinhe Zheng; Guanyi Chen; Xin Liu; Ke Lin

MMChat：ソーシャルメディア上のマルチモーダルチャットデータセット

マルチモーダルコンテキストを会話に組み込むことは、より魅力的な対話システムを開発するための重要なステップです。この作業では、MMChatを導入することにより、この方向性を探ります。大規模なマルチモーダル対話コーパス（32.4Mの生の対話と120.84Kのフィルター処理された対話）。クラウドソーシングまたは架空の映画から収集された以前のコーパスとは異なり、MMChatには、スパース性の問題が観察されるソーシャルメディアでの実際の会話から収集された画像ベースの対話が含まれています。具体的には、一般的なコミュニケーションにおける画像開始の対話は、会話が進むにつれて、画像に基づかないいくつかのトピックに逸脱する可能性があります。画像の特徴に注意ルーティングメカニズムを適応させることにより、対話生成タスクでこの問題に対処するためのベンチマークモデルを開発します。実験は、画像の特徴を組み込むことの有用性と、画像の特徴の希薄さを処理する際の有効性を示しています。

Incorporating multi-modal contexts in conversation is an important step for developing more engaging dialogue systems. In this work, we explore this direction by introducing MMChat: a large scale multi-modal dialogue corpus (32.4M raw dialogues and 120.84K filtered dialogues). Unlike previous corpora that are crowd-sourced or collected from fictitious movies, MMChat contains image-grounded dialogues collected from real conversations on social media, in which the sparsity issue is observed. Specifically, image-initiated dialogues in common communications may deviate to some non-image-grounded topics as the conversation proceeds. We develop a benchmark model to address this issue in dialogue generation tasks by adapting the attention routing mechanism on image features. Experiments demonstrate the usefulness of incorporating image features and the effectiveness in handling the sparsity of image features.

updated: Mon Aug 16 2021 15:27:49 GMT+0000 (UTC)

published: Mon Aug 16 2021 15:27:49 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト