GPT2MVS: Generative Pre-trained Transformer-2 for Multi-modal Video Summarization

Jia-Hong Huang; Luka Murn; Marta Mrak; Marcel Worring

GPT2MVS：マルチモーダルビデオ要約用の生成的事前トレーニング済みTransformer-2

従来のビデオ要約方法は、ユーザーの関心に関係なく、固定されたビデオ表現を生成します。したがって、このような方法は、コンテンツの検索および探索シナリオにおけるユーザーの期待を制限します。マルチモーダルビデオ要約は、この問題に対処するために利用される方法の1つです。マルチモーダルビデオ要約を使用してビデオ探索を支援する場合、テキストベースのクエリはユーザー定義であるため、ビデオ要約生成の主要な推進要因の1つと見なされます。したがって、テキストベースのクエリとビデオを効果的にエンコードすることは、マルチモーダルビデオ要約のタスクにとって両方とも重要です。この作業では、このタスクに取り組むために、特殊な注意ネットワークと文脈化された単語表現を使用する新しい方法が提案されています。提案されたモデルは、コンテキスト化されたビデオ要約コントローラー、マルチモーダル注意メカニズム、対話型注意ネットワーク、およびビデオ要約ジェネレーターで構成されています。既存のマルチモーダルビデオ要約ベンチマークの評価に基づいて、実験結果は、提案されたモデルが、現在の状態と比較して、精度が+ 5.88％増加し、F1スコアが+ 4.06％増加することで効果的であることを示しています。 -アートメソッド。

Traditional video summarization methods generate fixed video representations regardless of user interest. Therefore such methods limit users' expectations in content search and exploration scenarios. Multi-modal video summarization is one of the methods utilized to address this problem. When multi-modal video summarization is used to help video exploration, a text-based query is considered as one of the main drivers of video summary generation, as it is user-defined. Thus, encoding the text-based query and the video effectively are both important for the task of multi-modal video summarization. In this work, a new method is proposed that uses a specialized attention network and contextualized word representations to tackle this task. The proposed model consists of a contextualized video summary controller, multi-modal attention mechanisms, an interactive attention network, and a video summary generator. Based on the evaluation of the existing multi-modal video summarization benchmark, experimental results show that the proposed model is effective with the increase of +5.88% in accuracy and +4.06% increase of F1-score, compared with the state-of-the-art method.

updated: Mon Apr 26 2021 10:50:37 GMT+0000 (UTC)

published: Mon Apr 26 2021 10:50:37 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト