Unsupervised Finetuning

Suichan Li; Dongdong Chen; Yinpeng Chen; Lu Yuan; Lei Zhang; Qi Chu; Bin Liu; Nenghai Yu

教師なし微調整

この論文では、よく知られている「教師あり微調整」の対称問題である「教師なし微調整」について研究します。事前トレーニングされたモデルと小規模なラベルなしのターゲットデータが与えられた場合、教師なし微調整は、ソースドメインからターゲットドメインに事前トレーニングされた表現を適応させて、より良い転送パフォーマンスを取得できるようにすることです。小規模なターゲットデータのデータ密度が低いため、教師なし学習には適さないため、この問題は教師ありの問題よりも困難です。これにより、事前トレーニングされた表現が損傷し、ターゲットドメインの表現が不十分になります。このホワイトペーパーでは、微調整パラダイムを教師なしから教師なしに移行する際にソースデータが重要であることがわかり、ソースデータとターゲットデータを教師なし微調整に組み合わせるための2つのシンプルで効果的な戦略を提案します。「スパースソースデータの再生」と「データミキシング」です。。前者の戦略の動機は、ソースデータのごく一部を追加して、事前にトレーニングされた表現スペースを占有し、ターゲットデータをより小さなコンパクトスペースに配置できるようにすることです。後者の戦略の動機は、データ密度を高め、よりコンパクトな表現を学ぶのを助けることです。提案された「教師なし微調整」戦略の有効性を実証するために、複数の異なるターゲットデータセットで広範な実験を実施します。これは、単純な戦略よりも優れた転送パフォーマンスを示しています。

This paper studies "unsupervised finetuning", the symmetrical problem of the well-known "supervised finetuning". Given a pretrained model and small-scale unlabeled target data, unsupervised finetuning is to adapt the representation pretrained from the source domain to the target domain so that better transfer performance can be obtained. This problem is more challenging than the supervised counterpart, as the low data density in the small-scale target data is not friendly for unsupervised learning, leading to the damage of the pretrained representation and poor representation in the target domain. In this paper, we find the source data is crucial when shifting the finetuning paradigm from supervise to unsupervise, and propose two simple and effective strategies to combine source and target data into unsupervised finetuning: "sparse source data replaying", and "data mixing". The motivation of the former strategy is to add a small portion of source data back to occupy their pretrained representation space and help push the target data to reside in a smaller compact space; and the motivation of the latter strategy is to increase the data density and help learn more compact representation. To demonstrate the effectiveness of our proposed ``unsupervised finetuning'' strategy, we conduct extensive experiments on multiple different target datasets, which show better transfer performance than the naive strategy.

updated: Mon Oct 18 2021 17:57:05 GMT+0000 (UTC)

published: Mon Oct 18 2021 17:57:05 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト