DisCo: Remedy Self-supervised Learning on Lightweight Models with Distilled Contrastive Learning

Yuting Gao; Jia-Xin Zhuang; Ke Li; Hao Cheng; Xiaowei Guo; Feiyue Huang; Rongrong Ji; Xing Sun

DisCo：蒸留対照学習を使用した軽量モデルでの自己教師あり学習の改善策

自己教師あり表現学習（SSL）はコミュニティから広く注目されていますが、最近の研究では、モデルのサイズが小さくなると、そのパフォーマンスが大幅に低下すると主張しています。現在の方法は、主に対照学習に依存してネットワークをトレーニングします。この作業では、問題を大幅に軽減するために、シンプルで効果的な蒸留対照学習（DisCo）を提案します。具体的には、主流のSSLメソッドによって取得された最終的な埋め込みに最も有益な情報が含まれていることを確認し、最終的な埋め込みを抽出して、生徒の最後の埋め込みと一致するように制約することにより、教師の知識を軽量モデルに最大限に伝達することを提案します。先生。さらに、実験では、ボトルネックの蒸留と呼ばれる現象が存在し、この問題を軽減するために埋め込みの次元を拡大するために存在することがわかりました。私たちの方法では、展開中に軽量モデルに追加のパラメーターを導入しません。実験結果は、私たちの方法がすべての軽量モデルで最先端を達成することを示しています。特に、ResNet-101 / ResNet-50を教師として使用してEfficientNet-B0を教える場合、ImageNetでのEfficientNet-B0の線形結果は、ResNet-101 / ResNet-50に非常に近くなりますが、EfficientNet-B0のパラメーターの数はResNet-101 / ResNet-50の9.4％/ 16.3％にすぎません。

While self-supervised representation learning (SSL) has received widespread attention from the community, recent research argue that its performance will suffer a cliff fall when the model size decreases. The current method mainly relies on contrastive learning to train the network and in this work, we propose a simple yet effective Distilled Contrastive Learning (DisCo) to ease the issue by a large margin. Specifically, we find the final embedding obtained by the mainstream SSL methods contains the most fruitful information, and propose to distill the final embedding to maximally transmit a teacher's knowledge to a lightweight model by constraining the last embedding of the student to be consistent with that of the teacher. In addition, in the experiment, we find that there exists a phenomenon termed Distilling BottleNeck and present to enlarge the embedding dimension to alleviate this problem. Our method does not introduce any extra parameter to lightweight models during deployment. Experimental results demonstrate that our method achieves the state-of-the-art on all lightweight models. Particularly, when ResNet-101/ResNet-50 is used as teacher to teach EfficientNet-B0, the linear result of EfficientNet-B0 on ImageNet is very close to ResNet-101/ResNet-50, but the number of parameters of EfficientNet-B0 is only 9.4%/16.3% of ResNet-101/ResNet-50.

updated: Wed Jul 14 2021 11:29:35 GMT+0000 (UTC)

published: Mon Apr 19 2021 08:22:52 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト