Recent Advances in Video Question Answering: A Review of Datasets and Methods

Devshree Patel; Ratnam Parikh; Yesha Shastri

ビデオ質問応答の最近の進歩：データセットと方法のレビュー

ビデオ質問応答（VQA）は、コンピュータビジョンの分野で最近浮上している挑戦的なタスクです。ビデオキャプション/説明やビデオガイド付き機械翻訳などのいくつかの視覚的情報検索技術は、VQAのタスクに先行しています。 VQAは、ビデオシーンから時間的および空間的情報を取得し、それを解釈するのに役立ちます。この調査では、VQAのタスクに関するいくつかの方法とデータセットを確認します。私たちの知る限り、VQAタスクについてこれまでの調査は実施されていません。

Video Question Answering (VQA) is a recent emerging challenging task in the field of Computer Vision. Several visual information retrieval techniques like Video Captioning/Description and Video-guided Machine Translation have preceded the task of VQA. VQA helps to retrieve temporal and spatial information from the video scenes and interpret it. In this survey, we review a number of methods and datasets for the task of VQA. To the best of our knowledge, no previous survey has been conducted for the VQA task.

updated: Thu Mar 18 2021 14:30:16 GMT+0000 (UTC)

published: Fri Jan 15 2021 03:26:24 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト