Collaborative Multi-Teacher Knowledge Distillation for Learning Low Bit-width Deep Neural Networks

Cuong Pham; Tuan Hoang; Thanh-Toan Do

低ビット幅のディープニューラルネットワークを学習するための複数教師の共同知識抽出

扱いにくい教師モデルから知識を抽出することによって軽量の生徒モデルを学習する知識蒸留は、コンパクトなディープニューラルネットワーク (DNN) を学習するための魅力的なアプローチです。最近の研究では、複数の教師ネットワークを活用することで、学生ネットワークのパフォーマンスがさらに向上しています。ただし、既存の知識蒸留ベースの複数教師の方法のほとんどは、個別に事前トレーニングされた教師を使用します。これにより、教師間の共同学習と、教師と生徒間の相互学習が制限されます。ネットワーク量子化は、コンパクトな DNN を学習するためのもう 1 つの魅力的なアプローチです。ただし、ほとんどの既存のネットワーク量子化方法は、量子化された学生モデルのパフォーマンスを向上させるために複数の教師のサポートを考慮せずに開発および評価されています。この論文では、低ビット幅の DNN を学習するために、複数の教師による知識の蒸留とネットワーク量子化の両方を活用する新しいフレームワークを提案します。提案された方法は、量子化された教師間の共同学習と、量子化された教師と量子化された生徒との間の相互学習の両方を促進します。学習プロセス中、対応するレイヤーで、教師からの知識は、重要性を認識した共有知識を形成します。これは、後続のレイヤーで教師への入力として使用され、学生を指導するためにも使用されます。 CIFAR100 および ImageNet データセットでの実験結果は、私たちの方法でトレーニングされたコンパクトな量子化された学生モデルが、他の最先端の方法と比較して競争力のある結果を達成し、場合によっては実際に完全精度モデルを上回ることを示しています。

Knowledge distillation which learns a lightweight student model by distilling knowledge from a cumbersome teacher model is an attractive approach for learning compact deep neural networks (DNNs). Recent works further improve student network performance by leveraging multiple teacher networks. However, most of the existing knowledge distillation-based multi-teacher methods use separately pretrained teachers. This limits the collaborative learning between teachers and the mutual learning between teachers and student. Network quantization is another attractive approach for learning compact DNNs. However, most existing network quantization methods are developed and evaluated without considering multi-teacher support to enhance the performance of quantized student model. In this paper, we propose a novel framework that leverages both multi-teacher knowledge distillation and network quantization for learning low bit-width DNNs. The proposed method encourages both collaborative learning between quantized teachers and mutual learning between quantized teachers and quantized student. During learning process, at corresponding layers, knowledge from teachers will form an importance-aware shared knowledge which will be used as input for teachers at subsequent layers and also be used to guide student. Our experimental results on CIFAR100 and ImageNet datasets show that the compact quantized student models trained with our method achieve competitive results compared to other state-of-the-art methods, and in some cases, indeed surpass the full precision models.

updated: Thu Oct 27 2022 01:03:39 GMT+0000 (UTC)

published: Thu Oct 27 2022 01:03:39 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト