Transferring Domain-Agnostic Knowledge in Video Question Answering

Tianran Wu; Noa Garcia; Mayu Otani; Chenhui Chu; Yuta Nakashima; Haruo Takemura

ビデオ質問応答におけるドメインにとらわれない知識の転送

ビデオ質問応答（VideoQA）は、関連するビデオクリップに基づいて特定の質問に回答するように設計されています。現在利用可能な大規模なデータセットにより、視覚情報と言語情報の共同理解としてVideoQAを定式化することが可能になりました。ただし、このトレーニング手順はコストがかかり、人間のパフォーマンスにはまだ劣ります。本論文では、ドメインにとらわれない知識とドメイン固有の知識を導入することにより、転移学習法を調査する。まず、ドメインにとらわれない知識を媒体として適用することにより、事前にトレーニングされたモデルを微調整する、新しい転移学習フレームワークを開発します。次に、21,412の人間が生成した質問と回答のサンプルを使用して、新しいVideoQAデータセットを構築し、同等の知識の伝達を実現します。私たちの実験は、（i）ドメインにとらわれない知識が移転可能であり、（ii）提案された移転学習フレームワークがVideoQAのパフォーマンスを効果的に高めることができることを示しています。

Video question answering (VideoQA) is designed to answer a given question based on a relevant video clip. The current available large-scale datasets have made it possible to formulate VideoQA as the joint understanding of visual and language information. However, this training procedure is costly and still less competent with human performance. In this paper, we investigate a transfer learning method by the introduction of domain-agnostic knowledge and domain-specific knowledge. First, we develop a novel transfer learning framework, which finetunes the pre-trained model by applying domain-agnostic knowledge as the medium. Second, we construct a new VideoQA dataset with 21,412 human-generated question-answer samples for comparable transfer of knowledge. Our experiments show that: (i) domain-agnostic knowledge is transferable and (ii) our proposed transfer learning framework can boost VideoQA performance effectively.

updated: Tue Oct 26 2021 03:58:31 GMT+0000 (UTC)

published: Tue Oct 26 2021 03:58:31 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト