GOAL: A Challenging Knowledge-grounded Video Captioning Benchmark for Real-time Soccer Commentary Generation

Ji Qi; Jifan Yu; Teng Tu; Kunyu Gao; Yifan Xu; Xinyu Guan; Xiaozhi Wang; Yuxiao Dong; Bin Xu; Lei Hou; Juanzi Li; Jie Tang; Weidong Guo; Hui Liu; Yu Xu

目標: リアルタイムのサッカー解説生成のための挑戦的な知識に基づいたビデオキャプションベンチマーク

ビデオキャプションモデルの最近の出現にもかかわらず、背景知識 (つまり、適切な推論を伴うドメイン固有のシーンに関する長く有益な解説) に基づいて、鮮やかできめ細かいビデオの説明を生成する方法はまだ解決されていません。自動スポーツナラティブなどの優れたアプリケーションがあります。このホワイトペーパーでは、Knowledge-grounded Video Captioning (KGVC) として挑戦的な新しいタスク設定を提案するために、8.9k を超えるサッカービデオクリップ、22k の文、および 42k の知識トリプルのベンチマークである GOAL を紹介します。さらに、既存の方法を実験的に適応させて、この貴重で適用可能なタスクを解決するための難しさと潜在的な方向性を示します。

Despite the recent emergence of video captioning models, how to generate vivid, fine-grained video descriptions based on the background knowledge (i.e., long and informative commentary about the domain-specific scenes with appropriate reasoning) is still far from being solved, which however has great applications such as automatic sports narrative. In this paper, we present GOAL, a benchmark of over 8.9k soccer video clips, 22k sentences, and 42k knowledge triples for proposing a challenging new task setting as Knowledge-grounded Video Captioning (KGVC). Moreover, we conduct experimental adaption of existing methods to show the difficulty and potential directions for solving this valuable and applicable task.

updated: Sun Mar 26 2023 08:43:36 GMT+0000 (UTC)

published: Sun Mar 26 2023 08:43:36 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト