Keyword-Aware Relative Spatio-Temporal Graph Networks for Video Question Answering

Yi Cheng; Hehe Fan; Dongyun Lin; Ying Sun; Mohan Kankanhalli; Joo-Hwee Lim

ビデオ質問応答のためのキーワード認識相対時空間グラフネットワーク

ビデオ質問応答 (VideoQA) の主な課題は、与えられた質問に基づいてオブジェクト間の複雑な空間的および時間的関係を捉えて理解することです。 VideoQA の既存のグラフベースの方法は通常、質問内のキーワードを無視し、オブジェクト間の相対関係を考慮せずに単純なグラフを使用して特徴を集約するため、パフォーマンスが低下する可能性があります。この論文では、VideoQA 用のキーワード認識相対時空間 (KRST) グラフネットワークを提案します。まず、質問の特徴にキーワードを認識させるために、質問のエンコード中にキーワードに高い重みを割り当てるアテンションメカニズムを採用します。キーワードを認識した質問機能は、ビデオグラフの構築をガイドするために使用されます。第 2 に、関係は相対的なものであるため、相対関係モデリングを統合して、オブジェクトノード間の時空間ダイナミクスをより適切に捉えることができます。さらに、時空間推論をオブジェクトレベルの空間グラフとフレームレベルの時間グラフに分解することで、空間関係推論と時間関係推論が相互に与える影響を軽減します。 TGIF-QA、MSVD-QA、および MSRVTT-QA データセットに関する広範な実験により、複数の最先端の手法に対する KRST の優位性が実証されました。

The main challenge in video question answering (VideoQA) is to capture and understand the complex spatial and temporal relations between objects based on given questions. Existing graph-based methods for VideoQA usually ignore keywords in questions and employ a simple graph to aggregate features without considering relative relations between objects, which may lead to inferior performance. In this paper, we propose a Keyword-aware Relative Spatio-Temporal (KRST) graph network for VideoQA. First, to make question features aware of keywords, we employ an attention mechanism to assign high weights to keywords during question encoding. The keyword-aware question features are then used to guide video graph construction. Second, because relations are relative, we integrate the relative relation modeling to better capture the spatio-temporal dynamics among object nodes. Moreover, we disentangle the spatio-temporal reasoning into an object-level spatial graph and a frame-level temporal graph, which reduces the impact of spatial and temporal relation reasoning on each other. Extensive experiments on the TGIF-QA, MSVD-QA and MSRVTT-QA datasets demonstrate the superiority of our KRST over multiple state-of-the-art methods.

updated: Tue Jul 25 2023 04:41:32 GMT+0000 (UTC)

published: Tue Jul 25 2023 04:41:32 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト