WildQA: In-the-Wild Video Question Answering

Santiago Castro; Naihao Deng; Pingxuan Huang; Mihai Burzo; Rada Mihalcea

WildQA: 野生のビデオ質問応答

既存のビデオ理解データセットは、主に人間とのやり取りに焦点を当てており、ビデオが屋外で記録される「野生の」設定にはほとんど注意が払われていません。屋外で撮影されたビデオのビデオ理解データセットである WILDQA を提案します。ビデオによる質問応答 (ビデオ QA) に加えて、特定の質問と回答に対する視覚的なサポートを特定する新しいタスク (ビデオ証拠の選択) も導入します。幅広いベースラインモデルを使用した評価を通じて、WILDQA が視覚と言語の研究コミュニティに新たな課題をもたらすことを示します。データセットは https://lit.eecs.umich.edu/wildqa/ で入手できます。

Existing video understanding datasets mostly focus on human interactions, with little attention being paid to the "in the wild" settings, where the videos are recorded outdoors. We propose WILDQA, a video understanding dataset of videos recorded in outside settings. In addition to video question answering (Video QA), we also introduce the new task of identifying visual support for a given question and answer (Video Evidence Selection). Through evaluations using a wide range of baseline models, we show that WILDQA poses new challenges to the vision and language research communities. The dataset is available at https://lit.eecs.umich.edu/wildqa/.

updated: Wed Sep 14 2022 13:54:07 GMT+0000 (UTC)

published: Wed Sep 14 2022 13:54:07 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト