An Embarrassingly Simple Approach for Knowledge Distillation

Mengya Gao; Yujun Shen; Quanquan Li; Junjie Yan; Liang Wan; Dahua Lin; Chen Change Loy; Xiaoou Tang

知識蒸留のための恥ずかしいほど単純なアプローチ

Knowledge Distillation（KD）は、大容量の教師モデルから知識を継承することにより、低容量の学生モデルのパフォーマンスを向上させることを目的としています。従来のKDメソッドは、通常、タスク関連の損失とKD損失を同時に最小化し、事前に定義された損失の重みを使用してこれら2つの用語のバランスをとることにより、学生を訓練します。この作業では、まずバックボーンの知識を教師から生徒に移し、次に生徒ネットワークのタスクヘッドのみを学習することを提案します。このようなトレーニングプロセスの分解により、適切な損失ウェイトを選択する必要性が回避されますが、これは実際には困難な場合が多く、異なるデータセットおよびタスクへの適用が容易になります。重要なことは、分解により、段階的な知識の蒸留（SSKD）の方法の中核が可能になり、教師から生徒への段階的な機能模倣が容易になります。 CIFAR-100とImageNetでの広範な実験により、SSKDは生徒と教師の間のパフォーマンスギャップを大幅に狭め、最先端のアプローチよりも優れていることが示唆されています。また、IJB-Aデータセットでの顔認識やCOCOデータセットでのオブジェクト検出など、他の困難なベンチマークでのSSKDの一般化能力を示します。

Knowledge Distillation (KD) aims at improving the performance of a low-capacity student model by inheriting knowledge from a high-capacity teacher model. Previous KD methods typically train a student by minimizing a task-related loss and the KD loss simultaneously, using a pre-defined loss weight to balance these two terms. In this work, we propose to first transfer the backbone knowledge from a teacher to the student, and then only learn the task-head of the student network. Such a decomposition of the training process circumvents the need of choosing an appropriate loss weight, which is often difficult in practice, and thus makes it easier to apply to different datasets and tasks. Importantly, the decomposition permits the core of our method, Stage-by-Stage Knowledge Distillation (SSKD), which facilitates progressive feature mimicking from teacher to student. Extensive experiments on CIFAR-100 and ImageNet suggest that SSKD significantly narrows down the performance gap between student and teacher, outperforming state-of-the-art approaches. We also demonstrate the generalization ability of SSKD on other challenging benchmarks, including face recognition on IJB-A dataset as well as object detection on COCO dataset.

updated: Sun Sep 08 2019 16:46:52 GMT+0000 (UTC)

published: Wed Dec 05 2018 05:09:45 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト