Synopses of Movie Narratives: a Video-Language Dataset for Story Understanding

Yidan Sun; Qin Chao; Yangfeng Ji; Boyang Li

映画の物語のあらすじ: ストーリー理解のためのビデオ言語データセット

最近の AI の進歩にもかかわらず、ストーリーの理解は依然として未解決の問題であり、調査が不十分です。私たちは、人気のある映画やテレビシリーズの 5,193 のビデオ要約を含む、ビデオ言語のストーリーデータセット、Synopses of Movie Narratives (SYMON) を収集、前処理し、公開しています。 SYMON は、人間のクリエイターによって作成された、人間の視聴者向けの自然主義的なストーリーテリングビデオをキャプチャします。原型的で自然主義的なストーリーデータセットとして、SYMON はマルチモーダルストーリーイベントの高いカバレッジ、豊富な精神状態の説明、および視覚的モダリティとテキストモダリティ間の大きなセマンティックギャップを特徴としています。ストーリーの理解におけるドメイン内データの重要性を示す、映画の要約ビデオのビデオテキスト検索とゼロショットアラインメントに関するベンチマークを確立します。 SYMON を使用して、マルチモーダルなストーリー理解の進歩の基礎を築きたいと考えています。

Despite recent advances of AI, story understanding remains an open and under-investigated problem. We collect, preprocess, and publicly release a video-language story dataset, Synopses of Movie Narratives (SYMON), containing 5,193 video summaries of popular movies and TV series. SYMON captures naturalistic story-telling videos for human audience made by human creators. As a prototypical and naturalistic story dataset, SYMON features high coverage of multimodal story events, abundant mental-state descriptions, and large semantic gaps between the visual and the textual modalities. We establish benchmarks on video-text retrieval and zero-shot alignment on movie summary videos, which showcase the importance of in-domain data in story understanding. With SYMON, we hope to lay the groundwork for progress in multimodal story understanding.

updated: Mon Apr 03 2023 03:52:14 GMT+0000 (UTC)

published: Fri Mar 11 2022 01:45:33 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト