MMChat: Multi-Modal Chat Dataset on Social Media

Yinhe Zheng; Guanyi Chen; Xin Liu; Jian Sun

MMChat：ソーシャルメディア上のマルチモーダルチャットデータセット

マルチモーダルコンテキストを会話に組み込むことは、より魅力的な対話システムを開発するための重要なステップです。この作業では、MMChatを紹介することにより、この方向性を探ります。大規模な中国語のマルチモーダル対話コーパス（32.4Mの生の対話と120.84Kのフィルター処理された対話）。クラウドソーシングまたは架空の映画から収集された以前のコーパスとは異なり、MMChatには、スパース性の問題が観察されるソーシャルメディアでの実際の会話から収集された画像ベースの対話が含まれています。具体的には、一般的なコミュニケーションにおける画像開始の対話は、会話が進むにつれて、画像に基づかないいくつかのトピックに逸脱する可能性があります。この問題をより適切に調査するために、MMChatからの100Kダイアログに手動で注釈を付け、それに応じてコーパスをさらにフィルタリングします。これにより、MMChat-hfが生成されます。画像の特徴に注意ルーティングメカニズムを適応させることにより、対話生成タスクのスパース性の問題に対処するためのベンチマークモデルを開発します。実験は、画像の特徴を組み込むことの有用性と、画像の特徴の希薄さを処理する際の有効性を示しています。

Incorporating multi-modal contexts in conversation is an important step for developing more engaging dialogue systems. In this work, we explore this direction by introducing MMChat: a large scale Chinese multi-modal dialogue corpus (32.4M raw dialogues and 120.84K filtered dialogues). Unlike previous corpora that are crowd-sourced or collected from fictitious movies, MMChat contains image-grounded dialogues collected from real conversations on social media, in which the sparsity issue is observed. Specifically, image-initiated dialogues in common communications may deviate to some non-image-grounded topics as the conversation proceeds. To better investigate this issue, we manually annotate 100K dialogues from MMChat and further filter the corpus accordingly, which yields MMChat-hf. We develop a benchmark model to address the sparsity issue in dialogue generation tasks by adapting the attention routing mechanism on image features. Experiments demonstrate the usefulness of incorporating image features and the effectiveness in handling the sparsity of image features.

updated: Sat Apr 09 2022 02:04:48 GMT+0000 (UTC)

published: Mon Aug 16 2021 15:27:49 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト