Pretext-Contrastive Learning: Toward Good Practices in Self-supervised Video Representation Leaning

Li Tao; Xueting Wang; Toshihiko Yamasaki

口実-対照学習：自己教師ありビデオ表現学習のグッドプラクティスに向けて

最近、自己監視ビデオ特徴学習において、口実タスクベースの方法が次々に提案されている。一方、対照的な学習方法でも優れたパフォーマンスが得られます。通常、新しい方法は、「より良い」時間情報を取得できると主張されているように、以前の方法を打ち負かすことができます。しかし、それらの間には設定の違いがあり、どちらが良いかを結論付けるのは難しいです。これらの方法がパフォーマンスの限界に可能な限り近づいた場合、比較するとはるかに説得力があります。このホワイトペーパーでは、1つの口実タスクのベースラインから開始し、対照学習、データ前処理、およびデータ拡張と組み合わせることで、それがどこまで進むことができるかを探ります。適切な設定は、ベースラインを大幅に超える改善を達成できる広範な実験から発見されました。これは、共同最適化フレームワークが口実タスクと対照学習の両方を後押しできることを示しています。共同最適化フレームワークを口実対照学習（PCL）と呼びます。他の2つの口実タスクベースラインは、PCLの有効性を検証するために使用されます。また、同じトレーニング方法で現在の最先端の方法を簡単に上回り、提案の有効性と一般性を示しています。 PCLを標準のトレーニング戦略として扱い、自己教師ありビデオ機能学習の他の多くの作業に適用すると便利です。

Recently, pretext-task based methods are proposed one after another in self-supervised video feature learning. Meanwhile, contrastive learning methods also yield good performance. Usually, new methods can beat previous ones as claimed that they could capture "better" temporal information. However, there exist setting differences among them and it is hard to conclude which is better. It would be much more convincing in comparison if these methods have reached as closer to their performance limits as possible. In this paper, we start from one pretext-task baseline, exploring how far it can go by combining it with contrastive learning, data pre-processing, and data augmentation. A proper setting has been found from extensive experiments, with which huge improvements over the baselines can be achieved, indicating a joint optimization framework can boost both pretext task and contrastive learning. We denote the joint optimization framework as Pretext-Contrastive Learning (PCL). The other two pretext task baselines are used to validate the effectiveness of PCL. And we can easily outperform current state-of-the-art methods in the same training manner, showing the effectiveness and the generality of our proposal. It is convenient to treat PCL as a standard training strategy and apply it to many other works in self-supervised video feature learning.

updated: Sun Apr 04 2021 14:42:00 GMT+0000 (UTC)

published: Thu Oct 29 2020 10:20:35 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト