Adaptive Hierarchical Graph Reasoning with Semantic Coherence for Video-and-Language Inference

Juncheng Li; Siliang Tang; Linchao Zhu; Haochen Shi; Xuanwen Huang; Fei Wu; Yi Yang; Yueting Zhuang

ビデオと言語の推論のための意味論的コヒーレンスを用いた適応階層グラフ推論

ビデオと言語の推論は、ビデオと言語の共同理解のために最近提案されたタスクです。この新しいタスクでは、自然言語のステートメントが特定のビデオクリップを伴うか矛盾するかについて推論を行うためのモデルが必要です。この論文では、このタスクの3つの重要な課題に対処する方法を研究します。ステートメントのグローバルな正しさの判断には、複数の意味的意味、ビデオと字幕に関する共同推論、および長距離関係と複雑な社会的相互作用のモデリングが含まれます。まず、複雑な相互作用にわたってビデオの詳細な理解を実現する適応型階層グラフネットワークを提案します。具体的には、3つの階層でビデオと字幕に対して共同推論を実行します。グラフ構造は、ステートメントのセマンティック構造に従って適応的に調整されます。次に、セマンティックコヒーレンス学習を導入して、3つの階層からの適応階層グラフネットワークのセマンティックコヒーレンスを明示的に促進します。セマンティックコヒーレンス学習は、ビジョンと言語学の間の整合性、および一連のビデオセグメント全体のコヒーレンスをさらに向上させることができます。実験結果は、私たちの方法がベースラインを大幅に上回っていることを示しています。

Video-and-Language Inference is a recently proposed task for joint video-and-language understanding. This new task requires a model to draw inference on whether a natural language statement entails or contradicts a given video clip. In this paper, we study how to address three critical challenges for this task: judging the global correctness of the statement involved multiple semantic meanings, joint reasoning over video and subtitles, and modeling long-range relationships and complex social interactions. First, we propose an adaptive hierarchical graph network that achieves in-depth understanding of the video over complex interactions. Specifically, it performs joint reasoning over video and subtitles in three hierarchies, where the graph structure is adaptively adjusted according to the semantic structures of the statement. Secondly, we introduce semantic coherence learning to explicitly encourage the semantic coherence of the adaptive hierarchical graph network from three hierarchies. The semantic coherence learning can further improve the alignment between vision and linguistics, and the coherence across a sequence of video segments. Experimental results show that our method significantly outperforms the baseline by a large margin.

updated: Mon Aug 09 2021 08:50:13 GMT+0000 (UTC)

published: Mon Jul 26 2021 15:23:19 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト