Revitalize Region Feature for Democratizing Video-Language Pre-training

Guanyu Cai; Yixiao Ge; Alex Jinpeng Wang; Rui Yan; Xudong Lin; Ying Shan; Lianghua He; Xiaohu Qie; Jianping Wu; Mike Zheng Shou

ビデオ言語の事前トレーニングを民主化するための地域機能の活性化

ビデオ言語事前トレーニング（VLP）の最近の主な方法は、生のピクセルからエンドツーエンドの方法で転送可能な表現を学習し、ダウンストリームのビデオ言語タスクで高度なパフォーマンスを実現します。印象的な結果にもかかわらず、VLPの研究は非常に高価になり、大量のデータと長いトレーニング時間が必要になり、それ以上の調査ができなくなります。この作業では、まばらにサンプリングされたビデオクリップの領域機能を活性化して、VLP研究の民主化に向けて空間的および時間的な視覚的冗長性を大幅に削減し、同時に最先端の結果を達成します。具体的には、領域の特徴の可能性を完全に探求するために、領域と文中の特定の単語との間のきめ細かい関係を適切に最適化し、事前に抽出された領域の特徴と文章。 7つのデータセットでのダウンストリームのテキストからビデオへの検索とビデオの質問応答タスクの広範な結果は、有効性と効率の両方で私たちの方法の優位性を示しています。たとえば、私たちの方法は、80％少ないデータと85％少ない事前トレーニング時間で競合する結果を達成しますこれまでで最も効率的なVLP方式と比較して。コードはhttps://github.com/CuthbertCai/DemoVLPで入手できます。

Recent dominant methods for video-language pre-training (VLP) learn transferable representations from the raw pixels in an end-to-end manner to achieve advanced performance on downstream video-language tasks. Despite the impressive results, VLP research becomes extremely expensive with the need for massive data and a long training time, preventing further explorations. In this work, we revitalize region features of sparsely sampled video clips to significantly reduce both spatial and temporal visual redundancy towards democratizing VLP research at the same time achieving state-of-the-art results. Specifically, to fully explore the potential of region features, we introduce a novel bidirectional region-word alignment regularization that properly optimizes the fine-grained relations between regions and certain words in sentences, eliminating the domain/modality disconnections between pre-extracted region features and text. Extensive results of downstream text-to-video retrieval and video question answering tasks on seven datasets demonstrate the superiority of our method on both effectiveness and efficiency, e.g., our method achieves competing results with 80% fewer data and 85% less pre-training time compared to the most efficient VLP method so far. The code will be available at https://github.com/CuthbertCai/DemoVLP.

updated: Tue Mar 15 2022 08:18:27 GMT+0000 (UTC)

published: Tue Mar 15 2022 08:18:27 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト