Distilling a Powerful Student Model via Online Knowledge Distillation

Shaojie Li; Mingbao Lin; Yan Wang; Yongjian Wu; Yonghong Tian; Ling Shao; Rongrong Ji

オンライン知識蒸留による強力な学生モデルの蒸留

既存のオンライン知識蒸留アプローチは、最高のパフォーマンスを持つ学生を採用するか、より良い全体的なパフォーマンスのためにアンサンブルモデルを構築します。ただし、前者の戦略は他の学生の情報を無視しますが、後者は展開中の計算の複雑さを増します。本論文では、FFSDと呼ばれるオンライン知識蒸留の新しい方法を提案します。これは、統一されたフレームワークで上記の問題を解決するために、特徴融合と自己蒸留の2つの主要コンポーネントで構成されます。すべての学生が平等に扱われる以前の作品とは異なり、提案されたFFSDは、彼らをリーダー学生と共通の学生セットに分割します。次に、特徴融合モジュールは、すべての一般的な学生からの特徴マップの連結を融合された特徴マップに変換します。融合された表現は、リーダーの学生の学習を支援するために使用されます。リーダーの学生がより多様な情報を吸収できるようにするために、学生間の多様性を高めるための強化戦略を設計します。さらに、自己蒸留モジュールを採用して、より深い層の特徴マップをより浅い層に変換します。次に、浅いレイヤーは、深いレイヤーの変換された特徴マップを模倣するように促されます。これにより、学生はより一般化することができます。トレーニング後は、ストレージや推論のコストを増やすことなく、一般の学生よりも優れたパフォーマンスを実現するリーダーの学生を採用するだけです。 CIFAR-100とImageNetでの広範な実験は、既存の作品に対するFFSDの優位性を示しています。コードはhttps://github.com/SJLeo/FFSDで入手できます。

Existing online knowledge distillation approaches either adopt the student with the best performance or construct an ensemble model for better holistic performance. However, the former strategy ignores other students' information, while the latter increases the computational complexity during deployment. In this paper, we propose a novel method for online knowledge distillation, termed FFSD, which comprises two key components: Feature Fusion and Self-Distillation, towards solving the above problems in a unified framework. Different from previous works, where all students are treated equally, the proposed FFSD splits them into a leader student and a common student set. Then, the feature fusion module converts the concatenation of feature maps from all common students into a fused feature map. The fused representation is used to assist the learning of the leader student. To enable the leader student to absorb more diverse information, we design an enhancement strategy to increase the diversity among students. Besides, a self-distillation module is adopted to convert the feature map of deeper layers into a shallower one. Then, the shallower layers are encouraged to mimic the transformed feature maps of the deeper layers, which helps the students to generalize better. After training, we simply adopt the leader student, which achieves superior performance, over the common students, without increasing the storage or inference cost. Extensive experiments on CIFAR-100 and ImageNet demonstrate the superiority of our FFSD over existing works. The code is available at https://github.com/SJLeo/FFSD.

updated: Thu Feb 17 2022 02:47:29 GMT+0000 (UTC)

published: Fri Mar 26 2021 13:54:24 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト