It's All in the Head: Representation Knowledge Distillation through Classifier Sharing

Emanuel Ben-Baruch; Matan Karklinsky; Yossi Biton; Avi Ben-Cohen; Hussam Lawen; Nadav Zamir

すべてが頭の中にある：分類子の共有による表現知識の抽出

表現知識の蒸留は、あるモデルから別のモデルに豊富な情報を転送することを目的としています。表現蒸留の現在のアプローチは、主にモデルの埋め込みベクトル間の距離メトリックの直接最小化に焦点を合わせています。このような直接的な方法は、表現ベクトルに埋め込まれた高次の依存関係を転送する場合、または教師と生徒のモデル間の容量のギャップを処理する場合に制限される可能性があります。この論文では、教師と生徒の間で分類器を共有することを使用して表現の蒸留を強化するための2つのアプローチを紹介します。具体的には、最初に、教師の分類子を生徒のバックボーンに接続し、そのパラメーターを凍結することが、表現の蒸留のプロセスに有益であり、一貫した改善が得られることを示します。次に、容量が限られている生徒に合わせて教師モデルを調整するよう求める代替アプローチを提案します。このアプローチは、最初の方法と競合し、場合によってはそれを上回ります。広範な実験と分析を通じて、画像分類、きめ細かい分類、顔の検証など、さまざまなデータセットとタスクに対する提案された方法の有効性を示します。たとえば、MobileFaceNetモデルのIJB-Cデータセットで顔検証の最先端のパフォーマンスを達成します：TAR @（FAR = 1e-5）= 93.7％。コードはhttps://github.com/Alibaba-MIIL/HeadSharingKDで入手できます。

Representation knowledge distillation aims at transferring rich information from one model to another. Current approaches for representation distillation mainly focus on the direct minimization of distance metrics between the models' embedding vectors. Such direct methods may be limited in transferring high-order dependencies embedded in the representation vectors, or in handling the capacity gap between the teacher and student models. In this paper, we introduce two approaches for enhancing representation distillation using classifier sharing between the teacher and student. Specifically, we first show that connecting the teacher's classifier to the student backbone and freezing its parameters is beneficial for the process of representation distillation, yielding consistent improvements. Then, we propose an alternative approach that asks to tailor the teacher model to a student with limited capacity. This approach competes with and in some cases surpasses the first method. Via extensive experiments and analysis, we show the effectiveness of the proposed methods on various datasets and tasks, including image classification, fine-grained classification, and face verification. For example, we achieve state-of-the-art performance for face verification on the IJB-C dataset for a MobileFaceNet model: TAR@(FAR=1e-5)=93.7%. Code is available at https://github.com/Alibaba-MIIL/HeadSharingKD.

updated: Tue Jan 18 2022 13:10:36 GMT+0000 (UTC)

published: Tue Jan 18 2022 13:10:36 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト