MIO : Mutual Information Optimization using Self-Supervised Binary Contrastive Learning

Siladittya Manna; Saumik Bhattacharya; Umapada Pal

MIO：自己監視型バイナリ対照学習を使用した相互情報量の最適化

自己管理型対照学習は、過去数年間で急速に進歩した領域の1つです。最先端の自己監視アルゴリズムのほとんどは、多数のネガティブサンプル、勢いの更新、特定のアーキテクチャの変更、または広範なトレーニングを使用して、適切な表現を学習します。このような取り決めにより、トレーニングプロセス全体が複雑になり、分析的に実現するのが困難になります。本論文では、対比学習をバイナリ分類問題にモデル化してペアが正であるかどうかを予測する、対比学習のための相互情報最適化ベースの損失関数を提案します。この定式化は、問題を数学的に追跡するのに役立つだけでなく、既存のアルゴリズムよりも優れたパフォーマンスを発揮するのにも役立ちます。正のペアで相互情報量を最大化するだけの既存の方法とは異なり、提案された損失関数は、正と負のペアの両方で相互情報量を最適化します。また、プロジェクターに流入するパラメーター勾配と特徴空間内の特徴ベクトルの変位の数式を示します。これは、対照学習の動作原理について数学的な洞察を得るのに役立ちます。付加的なL_2正則化は、特徴ベクトルの発散を防ぎ、パフォーマンスを向上させるためにも使用されます。提案された方法は、STL-10、CIFAR-10、CIFAR-100などのベンチマークデータセットの最先端のアルゴリズムよりも優れています。わずか250エポックの事前トレーニングの後、提案されたモデルは、CIFAR-10、STL-10、CIFAR-100データセットでそれぞれ85.44％、60.75％、56.81％の最高の精度を達成します。

Self-supervised contrastive learning is one of the domains which has progressed rapidly over the last few years. Most of the state-of-the-art self-supervised algorithms use a large number of negative samples, momentum updates, specific architectural modifications, or extensive training to learn good representations. Such arrangements make the overall training process complex and challenging to realize analytically. In this paper, we propose a mutual information optimization based loss function for contrastive learning where we model contrastive learning into a binary classification problem to predict if a pair is positive or not. This formulation not only helps us to track the problem mathematically but also helps us to outperform existing algorithms. Unlike the existing methods that only maximize the mutual information in a positive pair, the proposed loss function optimizes the mutual information in both positive and negative pairs. We also present a mathematical expression for the parameter gradients flowing into the projector and the displacement of the feature vectors in the feature space. This helps us to get a mathematical insight into the working principle of contrastive learning. An additive L_2 regularizer is also used to prevent diverging of the feature vectors and to improve performance. The proposed method outperforms the state-of-the-art algorithms on benchmark datasets like STL-10, CIFAR-10, CIFAR-100. After only 250 epochs of pre-training, the proposed model achieves the best accuracy of 85.44%, 60.75%, 56.81% on CIFAR-10, STL-10, CIFAR-100 datasets, respectively.

updated: Wed Nov 24 2021 17:51:29 GMT+0000 (UTC)

published: Wed Nov 24 2021 17:51:29 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト