Self-Distilled Self-Supervised Representation Learning

Jiho Jang; Seonhoon Kim; Kiyoon Yoo; Jangho Kim; Nojun Kwak

自己蒸留自己監視表現学習

自己監視学習の最先端のフレームワークは、最近、トランスベースのモデルを完全に利用すると、従来のCNNモデルと比較してパフォーマンスが向上する可能性があることを示しています。画像の2つのビューの相互情報量を最大化するために繁栄している既存の作品は、最終的な表現に対照的な損失を適用します。私たちの仕事では、中間表現が対照的な損失を介して最終層から学習できるようにすることで、これをさらに活用します。これにより、元の目標の上限と2つの層の間の相互情報量が最大化されます。私たちの方法であるSelf-DistilledSelf-Supervised Learning（SDSSL）は、さまざまなタスクとデータセットでViTを使用して、競合するベースライン（SimCLR、BYOL、およびMoCo v3）を上回ります。線形評価とk-NNプロトコルでは、SDSSLは最終層だけでなく、ほとんどの下位層でも優れたパフォーマンスをもたらします。さらに、正と負の配置は、表現がより効果的に形成される方法を説明するために使用されます。コードが利用可能になります。

State-of-the-art frameworks in self-supervised learning have recently shown that fully utilizing transformer-based models can lead to performance boost compared to conventional CNN models. Thriving to maximize the mutual information of two views of an image, existing works apply a contrastive loss to the final representations. In our work, we further exploit this by allowing the intermediate representations to learn from the final layers via the contrastive loss, which is maximizing the upper bound of the original goal and the mutual information between two layers. Our method, Self-Distilled Self-Supervised Learning (SDSSL), outperforms competitive baselines (SimCLR, BYOL and MoCo v3) using ViT on various tasks and datasets. In the linear evaluation and k-NN protocol, SDSSL not only leads to superior performance in the final layers, but also in most of the lower layers. Furthermore, positive and negative alignments are used to explain how representations are formed more effectively. Code will be available.

updated: Thu Nov 25 2021 07:52:36 GMT+0000 (UTC)

published: Thu Nov 25 2021 07:52:36 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト