Keep it Consistent: Topic-Aware Storytelling from an Image Stream via Iterative Multi-agent Communication

Ruize Wang; Zhongyu Wei; Ying Cheng; Piji Li; Haijun Shan; Ji Zhang; Qi Zhang; Xuanjing Huang

一貫性を保つ：反復マルチエージェント通信を介した画像ストリームからのトピック認識ストーリーテリング

視覚的なストーリーテリングは、一連の画像から物語の段落を自動的に生成することを目的としています。既存のアプローチでは、画像ごとに独立してテキストの説明を作成し、それらをストーリーとして大まかに連結します。これにより、意味的に一貫性のないコンテンツが生成されるという問題が発生します。この論文では、画像ストリームのグローバルセマンティックコンテキストを検出するトピック記述タスクを導入することにより、視覚的なストーリーテリングの新しい方法を提案します。次に、トピックの説明のガイダンスを使用してストーリーが作成されます。 2つの生成タスクを組み合わせるために、トピック記述ジェネレーターとストーリージェネレーターを2つのエージェントと見なし、反復更新メカニズムを介してそれらを同時に学習するマルチエージェント通信フレームワークを提案します。 VISTデータセットでのアプローチを検証します。ここでは、定量的な結果、アブレーション、および人間による評価により、最先端の方法と比較して高品質のストーリーを生成する方法の優れた能力が実証されています。

Visual storytelling aims to generate a narrative paragraph from a sequence of images automatically. Existing approaches construct text description independently for each image and roughly concatenate them as a story, which leads to the problem of generating semantically incoherent content. In this paper, we propose a new way for visual storytelling by introducing a topic description task to detect the global semantic context of an image stream. A story is then constructed with the guidance of the topic description. In order to combine the two generation tasks, we propose a multi-agent communication framework that regards the topic description generator and the story generator as two agents and learn them simultaneously via iterative updating mechanism. We validate our approach on VIST dataset, where quantitative results, ablations, and human evaluation demonstrate our method's good ability in generating stories with higher quality compared to state-of-the-art methods.

updated: Fri Oct 30 2020 07:08:10 GMT+0000 (UTC)

published: Mon Nov 11 2019 11:35:21 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト