Representation Consolidation for Training Expert Students

Zhizhong Li; Avinash Ravichandran; Charless Fowlkes; Marzia Polito; Rahul Bhotika; Stefano Soatto

専門家の学生を訓練するための表現の統合

伝統的に、蒸留は教師の入出力機能をエミュレートするために学生モデルを訓練するために使用されてきました。エミュレーションよりも有用な目標は、まだ十分に検討されていませんが、学生が将来のタスクにうまく移行する特徴表現を学習することです。ただし、タスク固有の教師の標準的な蒸留により、実際には生徒の表現を下流のタスクに転送できるようになります。ラベルのないプロキシデータセットとジェネラリスト教師を使用したマルチヘッド、マルチタスク蒸留法は、タスク固有の教師からの表現を統合し、ダウンストリームのパフォーマンスを向上させ、教師と強力なベースラインを上回るのに十分であることを示しますImageNetの事前トレーニング済み機能の一覧。私たちの方法では、1つまたは複数のドメインでトレーニングされた複数の教師の表現知識を1つのモデルに結合することもできます。このモデルの表現は、すべての教師のドメインで改善されます。

Traditionally, distillation has been used to train a student model to emulate the input/output functionality of a teacher. A more useful goal than emulation, yet under-explored, is for the student to learn feature representations that transfer well to future tasks. However, we observe that standard distillation of task-specific teachers actually *reduces* the transferability of student representations to downstream tasks. We show that a multi-head, multi-task distillation method using an unlabeled proxy dataset and a generalist teacher is sufficient to consolidate representations from task-specific teacher(s) and improve downstream performance, outperforming the teacher(s) and the strong baseline of ImageNet pretrained features. Our method can also combine the representational knowledge of multiple teachers trained on one or multiple domains into a single model, whose representation is improved on all teachers' domain(s).

updated: Fri Jul 16 2021 17:58:18 GMT+0000 (UTC)

published: Fri Jul 16 2021 17:58:18 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト