ViSeRet: A simple yet effective approach to moment retrieval via fine-grained video segmentation

Aiden Seungjoon Lee; Hanseok Oh; Minjoon Seo

ViSeRet：きめ細かいビデオセグメンテーションによるモーメント検索へのシンプルで効果的なアプローチ

ビデオテキスト検索には、メディア分析、監視、ロボット工学など、多くの実際のアプリケーションがあります。このホワイトペーパーでは、ICCV VALUEチャレンジ2021のビデオ検索トラックの1位のソリューションを紹介します。トレーニング済みのモデルのみを活用して、2つのビデオテキスト検索タスク（ビデオ検索とビデオコーパスモーメント検索）に共同で取り組むためのシンプルで効果的なアプローチを紹介します。ビデオ検索タスクについて。さらに、VALUEチャレンジで提示された4つのデータセット（TVr、How2r、YouCook2r、およびVATEXr）すべてで新しい最先端のパフォーマンスを実現するアンサンブルモデルを作成します。

Video-text retrieval has many real-world applications such as media analytics, surveillance, and robotics. This paper presents the 1st place solution to the video retrieval track of the ICCV VALUE Challenge 2021. We present a simple yet effective approach to jointly tackle two video-text retrieval tasks (video retrieval and video corpus moment retrieval) by leveraging the model trained only on the video retrieval task. In addition, we create an ensemble model that achieves the new state-of-the-art performance on all four datasets (TVr, How2r, YouCook2r, and VATEXr) presented in the VALUE Challenge.

updated: Tue Oct 12 2021 10:29:37 GMT+0000 (UTC)

published: Mon Oct 11 2021 10:39:13 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト