A New Training Framework for Deep Neural Network

Zhenyan Hou; Wenxuan Fan

ディープニューラルネットワークの新しいトレーニングフレームワーク

知識の蒸留は、知識を大きなモデルから小さなモデルに移すプロセスです。このプロセスでは、小さいモデルは大きいモデルの一般化能力を学習し、大きいモデルのパフォーマンスに近いパフォーマンスを維持します。知識の蒸留は、モデルの知識を移行するためのトレーニング手段を提供し、モデルの展開を容易にし、推論を高速化します。ただし、以前の蒸留方法では、事前にトレーニングされた教師モデルが必要であり、それでも計算とストレージのオーバーヘッドが発生します。この論文では、自己蒸留（SD）と呼ばれる新しい一般的なトレーニングフレームワークを提案します。さまざまなタスクとベンチマークデータセットでパフォーマンスの向上を列挙することにより、この方法の有効性を示します。

Knowledge distillation is the process of transferring the knowledge from a large model to a small model. In this process, the small model learns the generalization ability of the large model and retains the performance close to that of the large model. Knowledge distillation provides a training means to migrate the knowledge of models, facilitating model deployment and speeding up inference. However, previous distillation methods require pre-trained teacher models, which still bring computational and storage overheads. In this paper, a novel general training framework called Self Distillation (SD) is proposed. We demonstrate the effectiveness of our method by enumerating its performance improvements in diverse tasks and benchmark datasets.

updated: Thu Mar 25 2021 01:51:00 GMT+0000 (UTC)

published: Fri Mar 12 2021 15:29:00 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト