Meta Learning for Knowledge Distillation

Wangchunshu Zhou; Canwen Xu; Julian McAuley

知識蒸留のためのメタ学習

トレーニング中に教師モデルが固定される従来の知識蒸留 (KD) 方法に代わるシンプルで効果的な方法である、知識蒸留のためのメタ学習 (MetaDistil) を紹介します。メタ学習フレームワークで抽出された学生ネットワークのパフォーマンスからのフィードバックを使用して、教師ネットワークが学生ネットワークに知識をより適切に伝達することを学習できることを示します (つまり、教えることを学習します)。さらに、改善された内部学習者に焦点を当てたメタ学習アルゴリズムにおいて、内部学習者とメタ学習者間の調整を改善するためのパイロット更新メカニズムを導入します。さまざまなベンチマークでの実験では、MetaDistil が従来の KD アルゴリズムと比較して大幅な改善をもたらすことができ、さまざまな生徒の能力やハイパーパラメーターの選択に対する感度が低く、さまざまなタスクやモデルでの KD の使用が容易になることが示されています。コードは https://github.com/JetRunner/MetaDistil で入手できます。

We present Meta Learning for Knowledge Distillation (MetaDistil), a simple yet effective alternative to traditional knowledge distillation (KD) methods where the teacher model is fixed during training. We show the teacher network can learn to better transfer knowledge to the student network (i.e., learning to teach) with the feedback from the performance of the distilled student network in a meta learning framework. Moreover, we introduce a pilot update mechanism to improve the alignment between the inner-learner and meta-learner in meta learning algorithms that focus on an improved inner-learner. Experiments on various benchmarks show that MetaDistil can yield significant improvements compared with traditional KD algorithms and is less sensitive to the choice of different student capacity and hyperparameters, facilitating the use of KD on different tasks and models. The code is available at https://github.com/JetRunner/MetaDistil

updated: Tue Jun 08 2021 17:59:03 GMT+0000 (UTC)

published: Tue Jun 08 2021 17:59:03 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト