Balanced Knowledge Distillation for Long-tailed Learning

Shaoyu Zhang; Chen Chen; Xiyuan Hu; Silong Peng

ロングテール学習のためのバランスの取れた知識蒸留

ロングテールデータセットでトレーニングされたディープモデルは、テールクラスで不十分なパフォーマンスを示します。既存の方法は通常、分類損失を変更してテールクラスへの学習の焦点を増やし、ヘッドクラスのパフォーマンスを予期せず犠牲にします。実際、このスキームは、ロングテール学習の2つの目標、つまり、一般化可能な表現の学習とテールクラスの学習の促進の間に矛盾をもたらします。この作業では、ロングテールシナリオでの知識蒸留を調査し、2つの目標間の矛盾を解き、両方を同時に達成するために、Balanced Knowledge Distillation（BKD）という名前の新しい蒸留フレームワークを提案します。具体的には、バニラティーチャーモデルを前提として、インスタンスバランスのとれた分類損失とクラスバランスのとれた蒸留損失の組み合わせを最小化することにより、学生モデルをトレーニングします。前者はサンプルの多様性から恩恵を受け、一般化可能な表現を学習しますが、後者はクラスの事前確率を考慮し、主にテールクラスの学習を容易にします。 BKDでトレーニングされた学生モデルは、教師モデルと比較しても大幅なパフォーマンスの向上を実現します。いくつかのロングテールベンチマークデータセットで広範な実験を実施し、提案されたBKDがロングテールシナリオでの効果的な知識蒸留フレームワークであり、ロングテール学習のための新しい最先端の方法であることを示します。コードはhttps://github.com/EricZsy/BalancedKnowledgeDistillationで入手できます。

Deep models trained on long-tailed datasets exhibit unsatisfactory performance on tail classes. Existing methods usually modify the classification loss to increase the learning focus on tail classes, which unexpectedly sacrifice the performance on head classes. In fact, this scheme leads to a contradiction between the two goals of long-tailed learning, i.e., learning generalizable representations and facilitating learning for tail classes. In this work, we explore knowledge distillation in long-tailed scenarios and propose a novel distillation framework, named Balanced Knowledge Distillation (BKD), to disentangle the contradiction between the two goals and achieve both simultaneously. Specifically, given a vanilla teacher model, we train the student model by minimizing the combination of an instance-balanced classification loss and a class-balanced distillation loss. The former benefits from the sample diversity and learns generalizable representation, while the latter considers the class priors and facilitates learning mainly for tail classes. The student model trained with BKD obtains significant performance gain even compared with its teacher model. We conduct extensive experiments on several long-tailed benchmark datasets and demonstrate that the proposed BKD is an effective knowledge distillation framework in long-tailed scenarios, as well as a new state-of-the-art method for long-tailed learning. Code is available at https://github.com/EricZsy/BalancedKnowledgeDistillation .

updated: Wed Apr 21 2021 13:07:35 GMT+0000 (UTC)

published: Wed Apr 21 2021 13:07:35 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト