Adaptive Transfer Learning: a simple but effective transfer learning

Jung H Lee; Henry J Kvinge; Scott Howland; Zachary New; John Buckheit; Lauren A. Phillips; Elliott Skomski; Jessica Hibler; Courtney D. Corley; Nathan O. Hodas

アダプティブトランスファーラーニング：シンプルだが効果的なトランスファーラーニング

転移学習（TL）は、以前に取得した知識を活用して新しいタスクを効率的に学習し、限られた量のデータで深層学習（DL）モデルをトレーニングするために使用されています。 TLがDLに適用されると、事前にトレーニングされた（教師）モデルが微調整されて、ドメイン固有の（学生）モデルが構築されます。この微調整は、DLモデルを分類器と特徴抽出器に分解できるという事実に依存しており、一連の研究により、同じ特徴抽出器を使用して複数のタスクで分類器をトレーニングできることが示されました。さらに、最近の研究では、教師モデルの特徴抽出器を微調整して学生モデルをより効率的にトレーニングできる複数のアルゴリズムが提案されています。特徴抽出器の微調整に関係なく、学生モデルの分類器は、特徴抽出器の最終出力（つまり、最後から2番目のレイヤーの出力）でトレーニングされることに注意してください。ただし、最近の調査では、レイヤー間のResNetの特徴マップは機能的に同等である可能性があり、特徴抽出器内の特徴マップを使用して学生モデルの分類器をトレーニングできる可能性があります。この研究に触発されて、教師モデルの隠れ層の特徴マップを使用して、生徒モデルの精度（つまり、TLの効率）を改善できるかどうかをテストしました。具体的には、TLに最適な特徴マップのセットを選択できる「適応伝達学習（ATL）」を開発し、数ショットの学習設定でテストしました。私たちの経験的評価は、特に利用可能な例が限られている場合、ATLがDLモデルをより効率的に学習するのに役立つことを示唆しています。

Transfer learning (TL) leverages previously obtained knowledge to learn new tasks efficiently and has been used to train deep learning (DL) models with limited amount of data. When TL is applied to DL, pretrained (teacher) models are fine-tuned to build domain specific (student) models. This fine-tuning relies on the fact that DL model can be decomposed to classifiers and feature extractors, and a line of studies showed that the same feature extractors can be used to train classifiers on multiple tasks. Furthermore, recent studies proposed multiple algorithms that can fine-tune teacher models' feature extractors to train student models more efficiently. We note that regardless of the fine-tuning of feature extractors, the classifiers of student models are trained with final outputs of feature extractors (i.e., the outputs of penultimate layers). However, a recent study suggested that feature maps in ResNets across layers could be functionally equivalent, raising the possibility that feature maps inside the feature extractors can also be used to train student models' classifiers. Inspired by this study, we tested if feature maps in the hidden layers of the teacher models can be used to improve the student models' accuracy (i.e., TL's efficiency). Specifically, we developed 'adaptive transfer learning (ATL)', which can choose an optimal set of feature maps for TL, and tested it in the few-shot learning setting. Our empirical evaluations suggest that ATL can help DL models learn more efficiently, especially when available examples are limited.

updated: Mon Nov 22 2021 01:22:25 GMT+0000 (UTC)

published: Mon Nov 22 2021 01:22:25 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト