Reading Between the Lanes: Text VideoQA on the Road

George Tom; Minesh Mathew; Sergi Garcia; Dimosthenis Karatzas; C. V. Jawahar

車線間の読書: 路上でのテキスト VideoQA

道路周囲の文字や標識は、安全なナビゲーションと状況認識に不可欠な重要な情報をドライバーに提供します。動いているシーンのテキスト認識は困難な問題ですが、テキストの手がかりは通常、短期間しか出現しないため、離れた場所から早期に検出する必要があります。このような情報を利用してドライバーを支援するシステムは、ビデオストリームから視覚的およびテキストの手がかりを抽出して組み込むだけでなく、時間をかけて推論する必要があります。この問題に対処するために、ドライバー支援のコンテキストにおけるビデオ質問応答 (VideoQA) のタスク用の新しいデータセットである RoadTextVQA を導入します。 RoadTextVQA は、複数の国から収集された 3,222 の運転ビデオで構成されており、10,500 の質問で注釈が付けられています。これらはすべて、運転ビデオに含まれるテキストまたは道路標識に基づいています。私たちは、RoadTextVQA データセット上で最先端のビデオ質問応答モデルのパフォーマンスを評価し、この領域における改善の大きな可能性と、車載サポートシステムとテキスト対応マルチモーダルの研究を進める上でのデータセットの有用性を強調しています。質問応答。データセットは http://cvit.iiit.ac.in/research/projects/cvit-projects/roadtextvqa で入手できます。

Text and signs around roads provide crucial information for drivers, vital for safe navigation and situational awareness. Scene text recognition in motion is a challenging problem, while textual cues typically appear for a short time span, and early detection at a distance is necessary. Systems that exploit such information to assist the driver should not only extract and incorporate visual and textual cues from the video stream but also reason over time. To address this issue, we introduce RoadTextVQA, a new dataset for the task of video question answering (VideoQA) in the context of driver assistance. RoadTextVQA consists of 3,222 driving videos collected from multiple countries, annotated with 10,500 questions, all based on text or road signs present in the driving videos. We assess the performance of state-of-the-art video question answering models on our RoadTextVQA dataset, highlighting the significant potential for improvement in this domain and the usefulness of the dataset in advancing research on in-vehicle support systems and text-aware multimodal question answering. The dataset is available at http://cvit.iiit.ac.in/research/projects/cvit-projects/roadtextvqa

updated: Sat Jul 08 2023 10:11:29 GMT+0000 (UTC)

published: Sat Jul 08 2023 10:11:29 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト