AutoMatch: A Large-scale Audio Beat Matching Benchmark for Boosting Deep Learning Assistant Video Editing

Sen Pei; Jingya Yu; Qi Chen; Wozhou He

AutoMatch: ディープラーニングアシスタントビデオ編集を促進するための大規模なオーディオビートマッチングベンチマーク

ショートビデオの急増は、人々の社交の仕方を劇的に変化させ、毎日の共有と最新情報のアクセスに対する新しいトレンドを生み出しています。これらの豊富なビデオリソースは、一方ではカメラ付きのポータブルデバイスの普及の恩恵を受けましたが、他方では、多数のビデオクリエーターによる貴重な編集作業から切り離すことはできません。この論文では、バックグラウンドミュージックに基づいて適切な遷移タイムスタンプを推奨することを目的とした、オーディオビートマッチング（ABM）という斬新で実用的な問題を調査します。この手法により、動画編集の手間が軽減され、クリエイターは動画コンテンツの創造性に集中できるようになります。 ABM 問題とその評価プロトコルを正式に定義します。一方、大規模なオーディオデータセット、つまり 87k を超える詳細に注釈が付けられたバックグラウンドミュージックを含む AutoMatch は、この新たに開かれた研究の方向性を促進するために提示されます。次の研究の強固な基盤をさらに築くために、この困難なタスクに取り組むための BeatX と呼ばれる新しいモデルも提案します。同時に、ラベルスコープの概念を創造的に提示します。これにより、データの不均衡の問題が解消され、トレーニング手順中にグラウンドトゥルースに適応的な重みがワンストップで割り当てられます。多くの短いビデオプラットフォームが長い間繁栄してきましたが、このシナリオに関する関連研究は十分ではなく、私たちの知る限り、AutoMatch はオーディオビートマッチングの問題に取り組む最初の大規模なデータセットです。リリースされたデータセットと競争力のあるベースラインが、この一連の研究への注目を高めることを願っています.データセットとコードは公開されます。

The explosion of short videos has dramatically reshaped the manners people socialize, yielding a new trend for daily sharing and access to the latest information. These rich video resources, on the one hand, benefited from the popularization of portable devices with cameras, but on the other, they can not be independent of the valuable editing work contributed by numerous video creators. In this paper, we investigate a novel and practical problem, namely audio beat matching (ABM), which aims to recommend the proper transition time stamps based on the background music. This technique helps to ease the labor-intensive work during video editing, saving energy for creators so that they can focus more on the creativity of video content. We formally define the ABM problem and its evaluation protocol. Meanwhile, a large-scale audio dataset, i.e., the AutoMatch with over 87k finely annotated background music, is presented to facilitate this newly opened research direction. To further lay solid foundations for the following study, we also propose a novel model termed BeatX to tackle this challenging task. Alongside, we creatively present the concept of label scope, which eliminates the data imbalance issues and assigns adaptive weights for the ground truth during the training procedure in one stop. Though plentiful short video platforms have flourished for a long time, the relevant research concerning this scenario is not sufficient, and to the best of our knowledge, AutoMatch is the first large-scale dataset to tackle the audio beat matching problem. We hope the released dataset and our competitive baseline can encourage more attention to this line of research. The dataset and codes will be made publicly available.

updated: Fri Mar 03 2023 12:30:09 GMT+0000 (UTC)

published: Fri Mar 03 2023 12:30:09 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト