Synopses of Movie Narratives: a Video-Language Dataset for Story Understanding

Yidan Sun; Qin Chao; Yangfeng Ji; Boyang Li

映画の物語のあらすじ: ストーリー理解のためのビデオ言語データセット

最近の AI の進歩にもかかわらず、ストーリーの理解は依然として未解決の問題であり、調査が不十分です。私たちはビデオ言語のストーリーデータセット、Synopses of Movie Narratives (SyMoN) を収集、前処理し、公開しています。これには、合計 869 時間の長さの人気映画とテレビシリーズの 5,193 のビデオサマリーが含まれています。 SyMoN は、人間のクリエイターによって作成され、人間の視聴者を対象とした自然主義的なストーリーテリングビデオをキャプチャします。原型的で自然主義的なストーリーデータセットとして、SyMoN はマルチモーダルストーリーイベントの高いカバレッジと豊富な精神状態の説明を特徴としています。ストーリーテリング手法を使用すると、クロスドメインのセマンティックギャップが生じ、既存のモデルに適切な課題がもたらされます。ストーリー理解におけるドメイン内データと長期記憶の重要性を示す、映画の概要ビデオのビデオテキスト検索とゼロショットアラインメントに関するベンチマークを確立します。 SyMoN を使用して、マルチモーダルなストーリー理解の進歩の基礎を築きたいと考えています。

Despite recent advances of AI, story understanding remains an open and under-investigated problem. We collect, preprocess, and publicly release a video-language story dataset, Synopses of Movie Narratives (SyMoN), containing 5,193 video summaries of popular movies and TV series with a total length of 869 hours. SyMoN captures naturalistic storytelling videos made by human creators and intended for a human audience. As a prototypical and naturalistic story dataset, SyMoN features high coverage of multimodal story events and abundant mental-state descriptions. Its use of storytelling techniques cause cross-domain semantic gaps that provide appropriate challenges to existing models. We establish benchmarks on video-text retrieval and zero-shot alignment on movie summary videos, which showcase the importance of in-domain data and long-term memory in story understanding. With SyMoN, we hope to lay the groundwork for progress in multimodal story understanding.

updated: Wed Apr 05 2023 02:09:02 GMT+0000 (UTC)

published: Fri Mar 11 2022 01:45:33 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト