Rethinking Self-supervised Correspondence Learning: A Video Frame-level Similarity Perspective

Jiarui Xu; Xiaolong Wang

自己監視通信学習の再考：ビデオフレームレベルの類似性の視点

時空間対応の適切な表現を学習することは、オブジェクトの境界ボックスの追跡やビデオオブジェクトのピクセルセグメンテーションの実行など、さまざまなコンピュータビジョンタスクの鍵となります。大規模な対応の一般化可能な表現を学習するために、オブジェクトレベルまたはパッチレベルの類似性学習を明示的に実行するために、さまざまな自己教師あり口実タスクが提案されています。以前の文献に従う代わりに、ビデオフレームレベルの類似性（VFS）学習を使用して対応を学習することを提案します。つまり、単にビデオフレームの比較から学習します。私たちの仕事は、視覚認識のための画像レベルの対照学習と類似性学習における最近の成功に触発されています。私たちの仮説は、表現が認識に適している場合、類似のオブジェクトまたはパーツ間の対応を見つけるために畳み込み特徴が必要であるというものです。私たちの実験は、VFSがOTBビジュアルオブジェクトトラッキングとDAVISビデオオブジェクトセグメンテーションの両方で最先端の自己監視アプローチを上回っているという驚くべき結果を示しています。 VFSで重要なことについて詳細な分析を実行し、画像およびフレームレベルの類似性学習の新しいプロパティを明らかにします。コード付きのプロジェクトページはhttps://jerryxu.net/VFSで入手できます。

Learning a good representation for space-time correspondence is the key for various computer vision tasks, including tracking object bounding boxes and performing video object pixel segmentation. To learn generalizable representation for correspondence in large-scale, a variety of self-supervised pretext tasks are proposed to explicitly perform object-level or patch-level similarity learning. Instead of following the previous literature, we propose to learn correspondence using Video Frame-level Similarity (VFS) learning, i.e, simply learning from comparing video frames. Our work is inspired by the recent success in image-level contrastive learning and similarity learning for visual recognition. Our hypothesis is that if the representation is good for recognition, it requires the convolutional features to find correspondence between similar objects or parts. Our experiments show surprising results that VFS surpasses state-of-the-art self-supervised approaches for both OTB visual object tracking and DAVIS video object segmentation. We perform detailed analysis on what matters in VFS and reveals new properties on image and frame level similarity learning. Project page with code is available at https://jerryxu.net/VFS

updated: Thu Oct 14 2021 01:44:48 GMT+0000 (UTC)

published: Wed Mar 31 2021 17:56:35 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト