ProxylessKD: Direct Knowledge Distillation with Inherited Classifier for Face Recognition

Weidong Shi; Guanghui Ren; Yunpeng Chen; Shuicheng Yan

ProxylessKD：顔認識のための継承された分類子を使用した直接知識蒸留

知識蒸留（KD）とは、大きなモデルから小さなモデルに知識を転送することを指します。これは、機械学習でモデルのパフォーマンスを向上させるために広く使用されています。教師と生徒のモデルから生成された埋め込みスペースを調整しようとします（つまり、同じセマンティクスに対応する画像が異なるモデル間で同じ埋め込みを共有するようにします）。この作業では、顔認識への応用に焦点を当てます。既存の知識蒸留モデルは、顔認識の精度を直接最適化するのではなく、生徒に教師の行動を模倣するように強制するプロキシタスクを最適化することを観察しています。その結果、取得された学生モデルは、ターゲットタスクで最適であるとは限らず、大きなマージン制約（マージンベースのソフトマックスなど）などの高度な制約から利益を得ることができるとは限りません。次に、教師の分類器を生徒の分類器として継承し、教師の埋め込みスペースで識別可能な埋め込みを学習するように生徒をガイドすることにより、顔認識の精度を直接最適化するProxylessKDという新しい方法を提案します。提案されたProxylessKDは、実装が非常に簡単で、顔認識以外の他のタスクに拡張できるほど十分に汎用的です。標準的な顔認識ベンチマークで広範な実験を実施し、その結果は、ProxylessKDが既存の知識蒸留方法よりも優れたパフォーマンスを達成することを示しています。

Knowledge Distillation (KD) refers to transferring knowledge from a large model to a smaller one, which is widely used to enhance model performance in machine learning. It tries to align embedding spaces generated from the teacher and the student model (i.e. to make images corresponding to the same semantics share the same embedding across different models). In this work, we focus on its application in face recognition. We observe that existing knowledge distillation models optimize the proxy tasks that force the student to mimic the teacher's behavior, instead of directly optimizing the face recognition accuracy. Consequently, the obtained student models are not guaranteed to be optimal on the target task or able to benefit from advanced constraints, such as large margin constraints (e.g. margin-based softmax). We then propose a novel method named ProxylessKD that directly optimizes face recognition accuracy by inheriting the teacher's classifier as the student's classifier to guide the student to learn discriminative embeddings in the teacher's embedding space. The proposed ProxylessKD is very easy to implement and sufficiently generic to be extended to other tasks beyond face recognition. We conduct extensive experiments on standard face recognition benchmarks, and the results demonstrate that ProxylessKD achieves superior performance over existing knowledge distillation methods.

updated: Sat Oct 31 2020 13:14:34 GMT+0000 (UTC)

published: Sat Oct 31 2020 13:14:34 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト