Retro: Reusing teacher projection head for efficient embedding distillation on Lightweight Models via Self-supervised Learning

Khanh-Binh Nguyen; Chae Jung Park

Self-supervised learning (SSL) is gaining attention for its ability to learn effective representations with large amounts of unlabeled data. Lightweight models can be distilled from larger self-supervised pre-trained models using contrastive and consistency constraints. Still, the different sizes of the projection heads make it challenging for students to mimic the teacher's embedding accurately. We propose Retro, which reuses the teacher's projection head for students, and our experimental results demonstrate significant improvements over the state-of-the-art on all lightweight models. For instance, when training EfficientNet-B0 using ResNet-50/101/152 as teachers, our approach improves the linear result on ImageNet to 66.9%, 69.3%, and 69.8%, respectively, with significantly fewer parameters.

updated: Sat Aug 24 2024 13:23:40 GMT+0000 (UTC)

published: Fri May 24 2024 07:53:09 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト