Pixel-level Correspondence for Self-Supervised Learning from Video

Yash Sharma; Yi Zhu; Chris Russell; Thomas Brox

ビデオからの自己教師あり学習のためのピクセルレベルの対応

自己監視学習により、ラベルがない場合でも効果的な表現学習が可能になりましたが、視覚に関しては、ビデオは比較的未開発の監視ソースのままです。これに対処するために、ビデオからの密な対照学習の方法であるピクセルレベルの対応（PiCo）を提案します。オプティカルフローでポイントを追跡することにより、さまざまな時点での局所的な特徴を照合するために使用できるコレスポンデンスマップを取得します。画像分類のパフォーマンスを損なうことなく、標準ベンチマークでPiCoを検証し、複数の高密度予測タスクで自己監視ベースラインを上回ります。

While self-supervised learning has enabled effective representation learning in the absence of labels, for vision, video remains a relatively untapped source of supervision. To address this, we propose Pixel-level Correspondence (PiCo), a method for dense contrastive learning from video. By tracking points with optical flow, we obtain a correspondence map which can be used to match local features at different points in time. We validate PiCo on standard benchmarks, outperforming self-supervised baselines on multiple dense prediction tasks, without compromising performance on image classification.

updated: Fri Jul 08 2022 12:50:13 GMT+0000 (UTC)

published: Fri Jul 08 2022 12:50:13 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト