Refine Myself by Teaching Myself: Feature Refinement via Self-Knowledge Distillation

Mingi Ji; Seungjae Shin; Seunghyun Hwang; Gibeom Park; Il-Chul Moon

自分自身を教えることによって自分自身を洗練する：自己知識蒸留による特徴の洗練

知識の蒸留は、事前にトレーニングされた複雑な教師モデルから学生モデルに知識を転送する方法であるため、展開段階で小規模なネットワークを大規模な教師ネットワークに置き換えることができます。大規模な教師モデルをトレーニングする必要性を減らすために、最近の文献では、事前にトレーニングされた教師ネットワークなしで自分の知識を蒸留するように学生ネットワークを段階的にトレーニングする自己知識蒸留が導入されました。自己知識蒸留は、データ拡張ベースのアプローチと補助ネットワークベースのアプローチに大きく分けられますが、データ拡張アプローチは、拡張プロセスでローカル情報を失い、セマンティックセグメンテーションなどの多様なビジョンタスクへの適用を妨げます。さらに、これらの知識蒸留アプローチは、オブジェクト検出およびセマンティックセグメンテーションコミュニティで普及している洗練された特徴マップを受け取りません。本論文は、分類器ネットワークのために洗練された知識を伝達するために補助的な自己教師ネットワークを利用する新しい自己知識蒸留法、自己知識蒸留による特徴洗練（FRSKD）を提案する。私たちが提案する方法であるFRSKDは、自己認識蒸留にソフトラベル蒸留と機能マップ蒸留の両方を利用できます。したがって、FRSKDは、ローカル情報の保存を強調する分類およびセマンティックセグメンテーションに適用できます。さまざまなタスクとベンチマークデータセットでのパフォーマンスの向上を列挙することにより、FRSKDの有効性を示します。実装されたコードはhttps://github.com/MingiJi/FRSKDで入手できます。

Knowledge distillation is a method of transferring the knowledge from a pretrained complex teacher model to a student model, so a smaller network can replace a large teacher network at the deployment stage. To reduce the necessity of training a large teacher model, the recent literatures introduced a self-knowledge distillation, which trains a student network progressively to distill its own knowledge without a pretrained teacher network. While Self-knowledge distillation is largely divided into a data augmentation based approach and an auxiliary network based approach, the data augmentation approach looses its local information in the augmentation process, which hinders its applicability to diverse vision tasks, such as semantic segmentation. Moreover, these knowledge distillation approaches do not receive the refined feature maps, which are prevalent in the object detection and semantic segmentation community. This paper proposes a novel self-knowledge distillation method, Feature Refinement via Self-Knowledge Distillation (FRSKD), which utilizes an auxiliary self-teacher network to transfer a refined knowledge for the classifier network. Our proposed method, FRSKD, can utilize both soft label and feature-map distillations for the self-knowledge distillation. Therefore, FRSKD can be applied to classification, and semantic segmentation, which emphasize preserving the local information. We demonstrate the effectiveness of FRSKD by enumerating its performance improvements in diverse tasks and benchmark datasets. The implemented code is available at https://github.com/MingiJi/FRSKD.

updated: Mon Mar 15 2021 10:59:43 GMT+0000 (UTC)

published: Mon Mar 15 2021 10:59:43 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト