TCR: Short Video Title Generation and Cover Selection with Attention Refinement

Yakun Yu; Jiuding Yang; Weidong Guo; Hui Liu; Yu Xu; Di Niu

TCR: 短いビデオタイトルの生成と、注目の洗練を伴うカバーの選択

ユーザー作成の短いビデオが広く普及するにつれて、コンテンツ作成者がコンテンツを潜在的な視聴者に宣伝することがますます困難になっています。短い動画の魅力的なタイトルとカバーを自動的に生成することで、視聴者の注目を集めることができます。ビデオのキャプションに関する既存の研究は、ほとんどの場合、視聴者の注意を引くことを目的としたビデオのタイトルに準拠していない、行動の事実の説明を生成することに焦点を当てています。さらに、マルチモーダル情報に基づくカバー選択の研究はまばらです。これらの問題は、短いビデオタイトルの生成とカバーの選択 (TG-CS) の共同タスクを具体的にサポートするための調整された方法の必要性と、研究をサポートするための対応するデータセットを作成する必要性を引き起こします。このホワイトペーパーでは、最初に、魅力的なタイトルとカバーを含むビデオを含む Short Video Title Generation (SVTG) という名前の実際のデータセットを収集して提示します。次に、TG-CS のための注意改良 (TCR) メソッドを使用したタイトル生成とカバー選択を提案します。改良手順では、高品質のサンプルと、各サンプル内の関連性の高いフレームとテキストトークンを段階的に選択して、モデルトレーニングを改良します。広範な実験により、タイトルの生成において、TCR メソッドがさまざまな既存のビデオキャプションメソッドよりも優れており、ノイズの多い現実世界の短いビデオに対してより適切なカバーを選択できることが示されています。

With the widespread popularity of user-generated short videos, it becomes increasingly challenging for content creators to promote their content to potential viewers. Automatically generating appealing titles and covers for short videos can help grab viewers' attention. Existing studies on video captioning mostly focus on generating factual descriptions of actions, which do not conform to video titles intended for catching viewer attention. Furthermore, research for cover selection based on multimodal information is sparse. These problems motivate the need for tailored methods to specifically support the joint task of short video title generation and cover selection (TG-CS) as well as the demand for creating corresponding datasets to support the studies. In this paper, we first collect and present a real-world dataset named Short Video Title Generation (SVTG) that contains videos with appealing titles and covers. We then propose a Title generation and Cover selection with attention Refinement (TCR) method for TG-CS. The refinement procedure progressively selects high-quality samples and highly relevant frames and text tokens within each sample to refine model training. Extensive experiments show that our TCR method is superior to various existing video captioning methods in generating titles and is able to select better covers for noisy real-world short videos.

updated: Tue Apr 25 2023 04:08:19 GMT+0000 (UTC)

published: Tue Apr 25 2023 04:08:19 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト