Nearest-Neighbor Inter-Intra Contrastive Learning from Unlabeled Videos

David Fan; Deyu Yang; Xinyu Li; Vimal Bhat; Rohith MV

ラベルのないビデオからの最近傍インターイントラ対照学習

対照的学習は、最近、画像とビデオのドメインにおける自己教師あり方法と教師あり方法の間のギャップを狭めました。 CVRL や ρ-MoCo などの最先端のビデオコントラスト学習法は、同じビデオからの 2 つのクリップをポジティブとして時空間的に増強します。これらの方法は、単一のビデオからポジティブなクリップをローカルにサンプリングするだけで、意味的に関連する他の有用なビデオを無視します。この制限に対処するために、グローバル空間からの最近傍ビデオを追加の正のペアとして活用することで、正のキーの多様性を改善し、ビデオやクラスの境界を超えて広がる類似性のより緩和された概念を導入します。私たちの方法である Inter-Intra Video Contrastive Learning (IIVCL) は、さまざまなビデオタスクのパフォーマンスを向上させます。

Contrastive learning has recently narrowed the gap between self-supervised and supervised methods in image and video domain. State-of-the-art video contrastive learning methods such as CVRL and ρ-MoCo spatiotemporally augment two clips from the same video as positives. By only sampling positive clips locally from a single video, these methods neglect other semantically related videos that can also be useful. To address this limitation, we leverage nearest-neighbor videos from the global space as additional positive pairs, thus improving positive key diversity and introducing a more relaxed notion of similarity that extends beyond video and even class boundaries. Our method, Inter-Intra Video Contrastive Learning (IIVCL), improves performance on a range of video tasks.

updated: Mon Mar 13 2023 17:38:58 GMT+0000 (UTC)

published: Mon Mar 13 2023 17:38:58 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト