Toward Understanding the Feature Learning Process of Self-supervised Contrastive Learning

Zixin Wen; Yuanzhi Li

自己教師あり対照学習の特徴学習プロセスの理解に向けて

対照学習によって訓練されたニューラルネットワークは、ラベルのないデータからどのように特徴を抽出できますか?対照学習では、適切な表現を保証するために、通常、教師あり学習よりもはるかに強力なデータ拡張が必要なのはなぜですか?これらの質問には、深層学習の最適化と統計的側面の両方が含まれますが、目標機能が最も追求される教師あり学習を分析することでは、ほとんど答えられません。確かに、自己教師あり学習では、ニューラルネットワークの最適化/一般化と、データ内の潜在的な構造をコード化する方法とを関連付けることは避けられません。これを特徴学習プロセスと呼びます。この作業では、特徴学習プロセスを分析することにより、対照学習がニューラルネットワークの特徴表現をどのように学習するかを正式に研究します。データが 2 種類の特徴で構成されている場合を考えます。1 つは、より意味的に整合性のある疎な特徴から学習し、もう 1 つは避けたい他の密な特徴です。理論的には、ReLU ネットワークを使用した対照的な学習は、適切な拡張が採用されている場合、目的のまばらな特徴を確実に学習することを証明します。オーグメンテーションの効果を説明するために、特徴デカップリングと呼ばれる基本原理を提示します。ここでは、スパース特徴の相関を損なわずに維持しながら、オーグメンテーションがポジティブサンプル間の密な特徴の相関をどのように低減できるかを理論的に特徴付け、それによってニューラルネットワークが自己から学習することを強制します。 - まばらな特徴の監督。経験的に、特徴デカップリングの原則が実際の対照学習の根底にあるメカニズムと一致することを確認しました。

How can neural networks trained by contrastive learning extract features from the unlabeled data? Why does contrastive learning usually need much stronger data augmentations than supervised learning to ensure good representations? These questions involve both the optimization and statistical aspects of deep learning, but can hardly be answered by analyzing supervised learning, where the target functions are the highest pursuit. Indeed, in self-supervised learning, it is inevitable to relate to the optimization/generalization of neural networks to how they can encode the latent structures in the data, which we refer to as the feature learning process. In this work, we formally study how contrastive learning learns the feature representations for neural networks by analyzing its feature learning process. We consider the case where our data are comprised of two types of features: the more semantically aligned sparse features which we want to learn from, and the other dense features we want to avoid. Theoretically, we prove that contrastive learning using ReLU networks provably learns the desired sparse features if proper augmentations are adopted. We present an underlying principle called feature decoupling to explain the effects of augmentations, where we theoretically characterize how augmentations can reduce the correlations of dense features between positive samples while keeping the correlations of sparse features intact, thereby forcing the neural networks to learn from the self-supervision of sparse features. Empirically, we verified that the feature decoupling principle matches the underlying mechanism of contrastive learning in practice.

updated: Mon May 31 2021 16:42:09 GMT+0000 (UTC)

published: Mon May 31 2021 16:42:09 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト