Progressive Domain Adaptation from a Source Pre-trained Model

Youngeun Kim; Donghyeon Cho; Priyadarshini Panda; Sungeun Hong

ソースの事前トレーニング済みモデルからのプログレッシブドメイン適応

ドメインの適応は、トレーニングフェーズ中にソースドメインとターゲットドメインからのサンプルに自由にアクセスできることを前提としています。ただし、このような仮定が現実の世界でもっともらしいことはめったになく、特にソースドメインのラベルが識別子として機密属性である可能性がある場合は、データプライバシーの問題を引き起こす可能性があります。機密情報を含む可能性のあるソースデータへのアクセスを回避するために、プログレッシブドメインアダプテーション（PrDA）を導入します。私たちの重要なアイデアは、ソースドメインから事前にトレーニングされたモデルを活用し、自己学習方式でターゲットモデルを段階的に更新することです。事前にトレーニングされたソースモデルによって測定された自己エントロピーが低いターゲットサンプルは、正しく分類される可能性が高いことがわかります。これから、自己エントロピー基準を使用して信頼できるサンプルを選択し、これらをクラスプロトタイプとして定義します。次に、クラスプロトタイプとの類似性スコアに基づいて、すべてのターゲットサンプルに疑似ラベルを割り当てます。さらに、疑似ラベリングプロセスからの不確実性を低減するために、調整可能なハイパーパラメータを必要としないセット間距離ベースのフィルタリングを提案します。最後に、事前にトレーニングされたソースモデルからの正則化を使用して、フィルター処理された疑似ラベルを使用してターゲットモデルをトレーニングします。驚いたことに、ラベル付けされたソースサンプルを直接使用しない場合、PrDAはベンチマークデータセットで従来のドメイン適応方法よりも優れています。私たちのコードはhttps://github.com/youngryan1993/PrDA-Progressive-Domain-Adaptation-from-a-Source-Pre-trained-Modelで公開されています。

Domain adaptation assumes that samples from source and target domains are freely accessible during a training phase. However, such an assumption is rarely plausible in the real-world and possibly causes data-privacy issues, especially when the label of the source domain can be a sensitive attribute as an identifier. To avoid accessing source data that may contain sensitive information, we introduce progressive domain adaptation (PrDA). Our key idea is to leverage a pre-trained model from the source domain and progressively update the target model in a self-learning manner. We observe that target samples with lower self-entropy measured by the pre-trained source model are more likely to be classified correctly. From this, we select the reliable samples with the self-entropy criterion and define these as class prototypes. We then assign pseudo labels for every target sample based on the similarity score with class prototypes. Furthermore, to reduce the uncertainty from the pseudo labeling process, we propose set-to-set distance-based filtering which does not require any tunable hyperparameters. Finally, we train the target model with the filtered pseudo labels with regularization from the pre-trained source model. Surprisingly, without direct usage of labeled source samples, our PrDA outperforms conventional domain adaptation methods on benchmark datasets. Our code is publicly available at https://github.com/youngryan1993/PrDA-Progressive-Domain-Adaptation-from-a-Source-Pre-trained-Model.

updated: Mon Dec 14 2020 15:57:52 GMT+0000 (UTC)

published: Fri Jul 03 2020 07:21:30 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト