Towards Coherent Visual Storytelling with Ordered Image Attention

Tom Braude; Idan Schwartz; Alexander Schwing; Ariel Shamir

順序付けられた画像の注意を伴う一貫した視覚的ストーリーテリングに向けて

視覚的なストーリーテリングの問題、つまり、特定の一連の画像のストーリーを生成する問題に対処します。ストーリーの各文は対応する画像を説明する必要がありますが、一貫性のあるストーリーも一貫性があり、将来と過去の両方の画像に関連している必要があります。これを達成するために、私たちは順序付き画像注意（OIA）を開発します。 OIAは、文に対応する画像とシーケンスの他の画像の重要な領域との間の相互作用をモデル化します。重要なオブジェクトを強調するために、メッセージパッシングのようなアルゴリズムは、順序を意識した方法でそれらのオブジェクトの表現を収集します。次に、ストーリーの文を生成するために、Image-Sentence Attention（ISA）を使用して重要な画像注意ベクトルを強調表示します。さらに、反復性などの一般的な言語ミスを軽減するために、アダプティブプライアを導入します。得られた結果は、VISTデータセットのMETEORスコアを1％改善します。さらに、広範な人間の研究は、一貫性の改善を検証し、OIAおよびISAで生成されたストーリーがより焦点を絞り、共有可能で、イメージに基づいていることを示しています。

We address the problem of visual storytelling, i.e., generating a story for a given sequence of images. While each sentence of the story should describe a corresponding image, a coherent story also needs to be consistent and relate to both future and past images. To achieve this we develop ordered image attention (OIA). OIA models interactions between the sentence-corresponding image and important regions in other images of the sequence. To highlight the important objects, a message-passing-like algorithm collects representations of those objects in an order-aware manner. To generate the story's sentences, we then highlight important image attention vectors with an Image-Sentence Attention (ISA). Further, to alleviate common linguistic mistakes like repetitiveness, we introduce an adaptive prior. The obtained results improve the METEOR score on the VIST dataset by 1%. In addition, an extensive human study verifies coherency improvements and shows that OIA and ISA generated stories are more focused, shareable, and image-grounded.

updated: Wed Aug 04 2021 17:12:39 GMT+0000 (UTC)

published: Wed Aug 04 2021 17:12:39 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト