Similarity Contrastive Estimation for Self-Supervised Soft Contrastive Learning

Julien Denize; Jaonary Rabarisoa; Astrid Orcesi; Romain Hérault; Stéphane Canu

自己教師ありソフト対照学習のための類似度対照推定

対照表現学習は、効果的な自己教師あり学習方法であることが証明されています。最も成功しているアプローチは、Noise Contrastive Estimation (NCE) に基づいており、インスタンスのさまざまなビューを、ノイズと見なされるネガティブと呼ばれる他のインスタンスと対比する必要があるポジティブとして使用します。ただし、データセット内のいくつかのインスタンスは同じ分布から抽出され、基になるセマンティック情報を共有します。適切なデータ表現には、インスタンス間の関係 (意味的な類似性) が含まれている必要があります。対照的な学習は暗黙のうちに関係を学習しますが、すべてのネガをノイズと見なすと、学習した関係の質が損なわれます。この問題を回避するために、Similarity Contrastive Estimation (SCE) と呼ばれるインスタンス間の意味的類似性を使用した対照学習の新しい定式化を提案します。私たちのトレーニング目標は、ソフトコントラスト学習です。ポジティブとネガティブを厳密に分類する代わりに、バッチの 1 つのビューから連続分布を推定し、意味上の類似性に基づいてインスタンスをプッシュまたはプルします。このターゲット類似度分布は、ノイズの多い関係を排除するためにシャープ化されます。このモデルは、各インスタンスについて、別の観点からターゲット分布を予測し、そのポジティブとネガティブを対比します。実験結果は、SCE が ImageNet 線形評価プロトコルで 100 の事前トレーニングエポックで 72.1% の精度でトップ 1 であり、マルチクロップで 200 エポックで 75.4% に達することで最先端のアルゴリズムと競合することを示しています。また、SCE がいくつかのタスクに一般化できることも示します。ソースコードは https://github.com/CEA-LIST/SCE から入手できます。

Contrastive representation learning has proven to be an effective self-supervised learning method. Most successful approaches are based on Noise Contrastive Estimation (NCE) and use different views of an instance as positives that should be contrasted with other instances, called negatives, that are considered as noise. However, several instances in a dataset are drawn from the same distribution and share underlying semantic information. A good data representation should contain relations, or semantic similarity, between the instances. Contrastive learning implicitly learns relations but considering all negatives as noise harms the quality of the learned relations. To circumvent this issue, we propose a novel formulation of contrastive learning using semantic similarity between instances called Similarity Contrastive Estimation (SCE). Our training objective is a soft contrastive learning one. Instead of hard classifying positives and negatives, we estimate from one view of a batch a continuous distribution to push or pull instances based on their semantic similarities. This target similarity distribution is sharpened to eliminate noisy relations. The model predicts for each instance, from another view, the target distribution while contrasting its positive with negatives. Experimental results show that SCE is Top-1 on the ImageNet linear evaluation protocol at 100 pretraining epochs with 72.1% accuracy and is competitive with state-of-the-art algorithms by reaching 75.4% for 200 epochs with multi-crop. We also show that SCE is able to generalize to several tasks. Source code is available here: https://github.com/CEA-LIST/SCE.

updated: Thu Sep 29 2022 08:19:09 GMT+0000 (UTC)

published: Mon Nov 29 2021 15:19:15 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト