Follow Your Path: a Progressive Method for Knowledge Distillation

Wenxian Shi; Yuxuan Song; Hao Zhou; Bohan Li; Lei Li

あなたの道をたどる：知識蒸留のための進歩的な方法

ディープニューラルネットワークには多くの場合、膨大な数のパラメーターがあり、メモリと計算能力が限られているアプリケーションシナリオでの展開に課題があります。知識の蒸留は、より大きなモデルからコンパクトなモデルを導き出すための1つのアプローチです。ただし、収束した重い教師モデルは、コンパクトな学生ネットワークを学習するために強く制約されており、最適化が不十分な局所最適化の対象となる可能性があることが観察されています。本論文では、教師モデルの監視信号を生徒のパラメータ空間に投影することにより、モデルにとらわれない新しい方法であるProKTを提案します。このような投影は、近似ミラー降下技術を使用してトレーニング目標をローカルの中間ターゲットに分解することによって実装されます。提案された方法は、最適化中の癖に対して感度が低くなる可能性があり、その結果、より良い局所最適化がもたらされる可能性があります。画像とテキストの両方のデータセットでの実験は、提案されたProKTが他の既存の知識蒸留法と比較して一貫して優れたパフォーマンスを達成することを示しています。

Deep neural networks often have a huge number of parameters, which posts challenges in deployment in application scenarios with limited memory and computation capacity. Knowledge distillation is one approach to derive compact models from bigger ones. However, it has been observed that a converged heavy teacher model is strongly constrained for learning a compact student network and could make the optimization subject to poor local optima. In this paper, we propose ProKT, a new model-agnostic method by projecting the supervision signals of a teacher model into the student's parameter space. Such projection is implemented by decomposing the training objective into local intermediate targets with an approximate mirror descent technique. The proposed method could be less sensitive with the quirks during optimization which could result in a better local optimum. Experiments on both image and text datasets show that our proposed ProKT consistently achieves superior performance compared to other existing knowledge distillation methods.

updated: Tue Jul 20 2021 07:44:33 GMT+0000 (UTC)

published: Tue Jul 20 2021 07:44:33 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト