It's All in the Head: Representation Knowledge Distillation through Classifier Sharing

Emanuel Ben-Baruch; Matan Karklinsky; Yossi Biton; Avi Ben-Cohen; Hussam Lawen; Nadav Zamir

すべてが頭の中にある：分類子の共有による表現知識の抽出

表現知識の蒸留は、あるモデルから別のモデルに豊富な情報を転送することを目的としています。表現の蒸留の一般的なアプローチは、主にモデルの埋め込みベクトル間の距離メトリックの直接最小化に焦点を当てています。このような直接的な方法は、表現ベクトルに埋め込まれた高次の依存関係を転送する場合、または教師と生徒のモデル間の容量のギャップを処理する場合に制限される可能性があります。さらに、標準的な知識の蒸留では、教師は生徒の特性や能力を意識せずに訓練されます。この論文では、教師と生徒の間で分類器を共有することにより、表現の蒸留を強化するための2つのメカニズムを探ります。最初に、教師の分類子が生徒のバックボーンに接続され、追加の分類ヘッドとして機能する単純なスキームを調査します。次に、臨時の生徒の頭で教師を訓練することにより、限られた能力の生徒に合わせて教師モデルを調整することを求める生徒認識メカニズムを提案します。これら2つのメカニズムを分析および比較し、画像分類、きめ細かい分類、顔の検証など、さまざまなデータセットとタスクでの有効性を示します。特に、MobileFaceNetモデルのIJB-Cデータセットでの顔検証の最新の結果を達成します：TAR @（FAR = 1e-5）= 93.7％。コードはhttps://github.com/Alibaba-MIIL/HeadSharingKDで入手できます。

Representation knowledge distillation aims at transferring rich information from one model to another. Common approaches for representation distillation mainly focus on the direct minimization of distance metrics between the models' embedding vectors. Such direct methods may be limited in transferring high-order dependencies embedded in the representation vectors, or in handling the capacity gap between the teacher and student models. Moreover, in standard knowledge distillation, the teacher is trained without awareness of the student's characteristics and capacity. In this paper, we explore two mechanisms for enhancing representation distillation using classifier sharing between the teacher and student. We first investigate a simple scheme where the teacher's classifier is connected to the student backbone, acting as an additional classification head. Then, we propose a student-aware mechanism that asks to tailor the teacher model to a student with limited capacity by training the teacher with a temporary student's head. We analyze and compare these two mechanisms and show their effectiveness on various datasets and tasks, including image classification, fine-grained classification, and face verification. In particular, we achieve state-of-the-art results for face verification on the IJB-C dataset for a MobileFaceNet model: TAR@(FAR=1e-5)=93.7%. Code is available at https://github.com/Alibaba-MIIL/HeadSharingKD.

updated: Tue Apr 05 2022 14:39:02 GMT+0000 (UTC)

published: Tue Jan 18 2022 13:10:36 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト