Contrastive Learning with Complex Heterogeneity

Lecheng Zheng; Jinjun Xiong; Yada Zhu; Jingrui He

複雑な不均一性を伴う対照的な学習

影響力の大きい複数のアプリケーションにまたがるビッグデータの出現により、複雑な異質性という課題に直面することがよくあります。新しく収集されたデータは通常、複数のモダリティで構成され、複数のラベルで特徴付けられるため、複数のタイプの異質性の共存を示します。最先端の技術は、十分なラベル情報を使用して複雑な異質性をモデル化するのに優れていますが、そのようなラベル情報を実際のアプリケーションで取得するには非常にコストがかかる可能性があります。最近、研究者は、豊富なラベルのないデータを利用することによるその卓越したパフォーマンスのために、対照的な学習に大きな注目を集めています。ただし、対照学習に関する既存の作業では、偽陰性ペアの問題に対処できません。つまり、一部の「陰性」ペアは、同じラベルを持っている場合、同様の表現を持つ可能性があります。この問題を克服するために、本論文では、加重教師なし対照損失と加重教師あり対照損失の両方を組み合わせて複数のタイプの異質性をモデル化する、統一された教師あり学習フレームワークを提案します。最初に、バニラの対照的な学習損失が偽陰性ペアの存在下で次善の解に容易につながることを示す理論的分析を提供しますが、提案された重み付き損失は、学習された表現の類似性に基づいて重みを自動的に調整し、これを軽減できます問題。実世界のデータセットに関する実験結果は、複数のタイプの不均一性をモデル化する提案されたフレームワークの有効性と効率を示しています。

With the advent of big data across multiple high-impact applications, we are often facing the challenge of complex heterogeneity. The newly collected data usually consist of multiple modalities and are characterized with multiple labels, thus exhibiting the co-existence of multiple types of heterogeneity. Although state-of-the-art techniques are good at modeling complex heterogeneity with sufficient label information, such label information can be quite expensive to obtain in real applications. Recently, researchers pay great attention to contrastive learning due to its prominent performance by utilizing rich unlabeled data. However, existing work on contrastive learning is not able to address the problem of false negative pairs, i.e., some `negative' pairs may have similar representations if they have the same label. To overcome the issues, in this paper, we propose a unified heterogeneous learning framework, which combines both the weighted unsupervised contrastive loss and the weighted supervised contrastive loss to model multiple types of heterogeneity. We first provide a theoretical analysis showing that the vanilla contrastive learning loss easily leads to the sub-optimal solution in the presence of false negative pairs, whereas the proposed weighted loss could automatically adjust the weight based on the similarity of the learned representations to mitigate this issue. Experimental results on real-world data sets demonstrate the effectiveness and the efficiency of the proposed framework modeling multiple types of heterogeneity.

updated: Thu Jul 21 2022 12:58:21 GMT+0000 (UTC)

published: Wed May 19 2021 21:01:41 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト