Video Background Music Generation: Dataset, Method and Evaluation

Le Zhuo; Zhaokai Wang; Baisen Wang; Yue Liao; Stanley Peng; Chenxi Bao; Miao Lu; Xiaobo Li; Si Liu

ビデオ BGM の生成: データセット、方法、および評価

動画編集に音楽は欠かせませんが、手動で音楽を選ぶのは難しく、手間がかかります。したがって、ビデオ入力が与えられると、BGM トラックを自動的に生成しようとします。通信を学習するには、ペアになったビデオと音楽が大量に必要になるため、これは困難な作業です。残念ながら、そのようなデータセットは存在しません。このギャップを埋めるために、ビデオバックグラウンドミュージック生成用のデータセット、ベンチマークモデル、および評価指標を導入します。コード、リズム、メロディー、伴奏の注釈とともに、ビデオと象徴的な音楽のデータセットである SymMV を紹介します。私たちの知る限り、これは高品質の象徴的な音楽と詳細な注釈を含む最初のビデオ音楽データセットです。また、V-MusProd という名前のベンチマークビデオバックグラウンドミュージック生成フレームワークも提案します。これは、和音、メロディー、伴奏の音楽事前確率を、セマンティック、色、およびモーション機能のビデオ音楽関係と共に利用します。ビデオと音楽の対応に関する客観的なメトリックの欠如に対処するために、強力なビデオと音楽の表現学習モデルに基づいて構築された検索ベースのメトリック VMCP を提案します。実験の結果、V-MusProd は、当社のデータセットを使用して、音楽の品質とビデオとの対応の両方で最先端の方法よりも優れていることが示されています。私たちのデータセット、ベンチマークモデル、および評価指標が、ビデオ BGM 生成の発展を後押しすると信じています。

Music is essential when editing videos, but selecting music manually is difficult and time-consuming. Thus, we seek to automatically generate background music tracks given video input. This is a challenging task since it requires plenty of paired videos and music to learn their correspondence. Unfortunately, there exist no such datasets. To close this gap, we introduce a dataset, benchmark model, and evaluation metric for video background music generation. We introduce SymMV, a video and symbolic music dataset, along with chord, rhythm, melody, and accompaniment annotations. To the best of our knowledge, it is the first video-music dataset with high-quality symbolic music and detailed annotations. We also propose a benchmark video background music generation framework named V-MusProd, which utilizes music priors of chords, melody, and accompaniment along with video-music relations of semantic, color, and motion features. To address the lack of objective metrics for video-music correspondence, we propose a retrieval-based metric VMCP built upon a powerful video-music representation learning model. Experiments show that with our dataset, V-MusProd outperforms the state-of-the-art method in both music quality and correspondence with videos. We believe our dataset, benchmark model, and evaluation metric will boost the development of video background music generation.

updated: Mon Nov 21 2022 08:39:48 GMT+0000 (UTC)

published: Mon Nov 21 2022 08:39:48 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト