Multi-granularity for knowledge distillation

Baitan Shao; Ying Chen

知識蒸留のためのマルチグラニュラリティ

生徒は教師から与えられた知識を理解する能力が異なるという事実を考慮して、生徒のネットワークにより理解しやすい知識を伝達するために、多粒度蒸留メカニズムが提案されています。教師ネットワークのマルチグラニュラリティ自己分析モジュールが設計されているため、学生ネットワークはさまざまな教育パターンから知識を学ぶことができます。さらに、安定した励起スキームは、学生のトレーニングのための堅牢な監督のために提案されています。提案された蒸留メカニズムは、ベースラインと見なされるさまざまな蒸留フレームワークに組み込むことができます。実験によると、このメカニズムにより、精度が平均で0.58％向上し、ベースラインよりも最高で1.08％向上し、最先端のパフォーマンスよりも優れたパフォーマンスが得られます。また、提案されたメカニズムを介して、学生の微調整能力とノイズの多い入力に対するロバスト性を向上させることができることも活用されています。コードはhttps://github.com/shaoeric/multi-granularity-distillationで入手できます。

Considering the fact that students have different abilities to understand the knowledge imparted by teachers, a multi-granularity distillation mechanism is proposed for transferring more understandable knowledge for student networks. A multi-granularity self-analyzing module of the teacher network is designed, which enables the student network to learn knowledge from different teaching patterns. Furthermore, a stable excitation scheme is proposed for robust supervision for the student training. The proposed distillation mechanism can be embedded into different distillation frameworks, which are taken as baselines. Experiments show the mechanism improves the accuracy by 0.58% on average and by 1.08% in the best over the baselines, which makes its performance superior to the state-of-the-arts. It is also exploited that the student's ability of fine-tuning and robustness to noisy inputs can be improved via the proposed mechanism. The code is available at https://github.com/shaoeric/multi-granularity-distillation.

updated: Sun Aug 15 2021 07:47:08 GMT+0000 (UTC)

published: Sun Aug 15 2021 07:47:08 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト