Similarity Transfer for Knowledge Distillation

Haoran Zhao; Kun Gong; Xin Sun; Junyu Dong; Hui Yu

知識蒸留のための類似性の移転

知識の蒸留は、大きなモデルから小さなモデルに知識を転送することにより、ポータブルニューラルネットワークを学習するための一般的なパラダイムです。ほとんどの既存のアプローチは、教師モデルによって提供されるインスタンスレベルのカテゴリ間の類似性情報を利用することにより、学生モデルを強化します。ただし、これらの作業は、信頼性予測で重要な役割を果たす異なるインスタンス間の類似性相関を無視します。この問題に取り組むために、我々は、複数のサンプルのカテゴリー間の類似性を十分に活用することを目的とした、知識蒸留のための類似性伝達（STKD）と呼ばれる新しい方法を本論文で提案する。さらに、加重線形補間によって仮想サンプルを作成するミックスアップ手法によって、異なるインスタンス間の類似性相関をより適切にキャプチャすることを提案します。私たちの蒸留損失は、混合ラベルによる誤ったクラスの類似性を十分に利用できることに注意してください。提案されたアプローチは、複数の画像によって作成された仮想サンプルが教師と学生のネットワークで同様の確率分布を生成するため、学生モデルのパフォーマンスを促進します。 CIFAR-10、CIFAR-100、CINIC-10、Tiny-ImageNetなど、いくつかの公開分類データセットに関する実験とアブレーション研究により、この軽量な方法がコンパクトな学生モデルのパフォーマンスを効果的に高めることができることが確認されています。これは、STKDがバニラ知識蒸留を大幅に上回り、最先端の知識蒸留方法よりも優れた精度を達成したことを示しています。

Knowledge distillation is a popular paradigm for learning portable neural networks by transferring the knowledge from a large model into a smaller one. Most existing approaches enhance the student model by utilizing the similarity information between the categories of instance level provided by the teacher model. However, these works ignore the similarity correlation between different instances that plays an important role in confidence prediction. To tackle this issue, we propose a novel method in this paper, called similarity transfer for knowledge distillation (STKD), which aims to fully utilize the similarities between categories of multiple samples. Furthermore, we propose to better capture the similarity correlation between different instances by the mixup technique, which creates virtual samples by a weighted linear interpolation. Note that, our distillation loss can fully utilize the incorrect classes similarities by the mixed labels. The proposed approach promotes the performance of student model as the virtual sample created by multiple images produces a similar probability distribution in the teacher and student networks. Experiments and ablation studies on several public classification datasets including CIFAR-10,CIFAR-100,CINIC-10 and Tiny-ImageNet verify that this light-weight method can effectively boost the performance of the compact student model. It shows that STKD substantially has outperformed the vanilla knowledge distillation and has achieved superior accuracy over the state-of-the-art knowledge distillation methods.

updated: Thu Mar 18 2021 06:54:59 GMT+0000 (UTC)

published: Thu Mar 18 2021 06:54:59 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト