TAPIR: Tracking Any Point with per-frame Initialization and temporal Refinement

Carl Doersch; Yi Yang; Mel Vecerik; Dilara Gokay; Ankush Gupta; Yusuf Aytar; Joao Carreira; Andrew Zisserman

TAPIR: フレームごとの初期化と時間的調整による任意のポイントの追跡

我々は、ビデオシーケンス全体を通じてあらゆる物理的表面上のあらゆるクエリされたポイントを効果的に追跡する、Tracking Any Point (TAP) の新しいモデルを紹介します。私たちのアプローチは 2 つのステージを採用しています: (1) マッチングステージ。クエリポイントに一致する適切な候補点を 1 フレームおきに個別に見つけます。(2) 改良ステージ。ローカル相関に基づいて軌跡とクエリ特徴の両方を更新します。。結果として得られるモデルは、DAVIS における Jaccard (AJ) の絶対平均の約 20% の向上が示すように、TAP-Vid ベンチマークですべてのベースライン手法を大幅に上回っています。私たちのモデルは、長くて高解像度のビデオシーケンスの高速推論を容易にします。最新の GPU では、私たちの実装はリアルタイムよりも速くポイントを追跡する能力を備えています。ビジュアライゼーション、ソースコード、および事前トレーニングされたモデルは、プロジェクトの Web ページで見つけることができます。

We present a novel model for Tracking Any Point (TAP) that effectively tracks any queried point on any physical surface throughout a video sequence. Our approach employs two stages: (1) a matching stage, which independently locates a suitable candidate point match for the query point on every other frame, and (2) a refinement stage, which updates both the trajectory and query features based on local correlations. The resulting model surpasses all baseline methods by a significant margin on the TAP-Vid benchmark, as demonstrated by an approximate 20% absolute average Jaccard (AJ) improvement on DAVIS. Our model facilitates fast inference on long and high-resolution video sequences. On a modern GPU, our implementation has the capacity to track points faster than real-time. Visualizations, source code, and pretrained models can be found on our project webpage.

updated: Wed Jun 14 2023 17:07:51 GMT+0000 (UTC)

published: Wed Jun 14 2023 17:07:51 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト