Single-Stage Visual Query Localization in Egocentric Videos

Hanwen Jiang; Santhosh Kumar Ramakrishnan; Kristen Grauman

自己中心的なビデオにおける単一ステージのビジュアルクエリローカリゼーション

自己中心的な長編ビデオの視覚的クエリローカライゼーションには、視覚的に指定されたオブジェクトの時空間検索とローカライゼーションが必要であり、エピソード記憶システムを構築するために不可欠です。これまでの研究では、確立されたオブジェクト検出および追跡方法を利用して VQL を実行する複雑な多段階パイプラインを開発しました。ただし、各ステージは個別にトレーニングされ、パイプラインの複雑さにより推論速度が遅くなります。私たちは、エンドツーエンドでトレーニング可能な新しい単一ステージ VQL フレームワークである VQLoC を提案します。私たちの重要なアイデアは、まずクエリとビデオの関係の全体的な理解を構築し、次に時空間位置特定をワンショットで実行することです。具体的には、クエリと各ビデオフレーム間のクエリとフレームの対応関係、および近くのビデオフレーム間のフレーム間の対応関係を共同で考慮することにより、クエリとビデオの関係を確立します。私たちの実験では、私たちのアプローチが以前の VQL 手法よりも精度が 20% 優れ、推論速度が 10 倍向上することが実証されました。 VQLoC は、Ego4D VQ2D チャレンジリーダーボードのトップエントリでもあります。プロジェクトページ：https://hwjiang1510.github.io/VQLoC/

Visual Query Localization on long-form egocentric videos requires spatio-temporal search and localization of visually specified objects and is vital to build episodic memory systems. Prior work develops complex multi-stage pipelines that leverage well-established object detection and tracking methods to perform VQL. However, each stage is independently trained and the complexity of the pipeline results in slow inference speeds. We propose VQLoC, a novel single-stage VQL framework that is end-to-end trainable. Our key idea is to first build a holistic understanding of the query-video relationship and then perform spatio-temporal localization in a single shot manner. Specifically, we establish the query-video relationship by jointly considering query-to-frame correspondences between the query and each video frame and frame-to-frame correspondences between nearby video frames. Our experiments demonstrate that our approach outperforms prior VQL methods by 20% accuracy while obtaining a 10x improvement in inference speed. VQLoC is also the top entry on the Ego4D VQ2D challenge leaderboard. Project page: https://hwjiang1510.github.io/VQLoC/

updated: Thu Jun 15 2023 17:57:28 GMT+0000 (UTC)

published: Thu Jun 15 2023 17:57:28 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト