Bi-tuning of Pre-trained Representations

Jincheng Zhong; Ximei Wang; Zhi Kou; Jianmin Wang; Mingsheng Long

事前にトレーニングされた表現のバイチューニング

ディープラーニングコミュニティでは、最初に大規模なデータセットからディープニューラルネットワークを事前トレーニングし、次に事前トレーニングされたモデルを特定のダウンストリームタスクに微調整するのが一般的です。最近、表現を学習するための教師ありおよび教師なしの事前トレーニングアプローチの両方が、ラベルの識別知識とデータの固有の構造をそれぞれ活用する、目覚ましい進歩を達成しました。下流のタスクの識別知識と固有の構造の両方が微調整に役立つ可能性があることは自然な直感に従いますが、既存の微調整方法は主に前者を活用し、後者を破棄します。疑問が生じます：微調整を促進するためにデータの固有の構造を完全に調査する方法は？この論文では、教師ありと教師なしの両方の事前訓練された表現を下流のタスクに微調整するための一般的な学習フレームワークであるBi-tuningを提案します。バイチューニングは、事前にトレーニングされた表現のバックボーンに2つのヘッドを統合することにより、バニラの微調整を一般化します。インスタンスコントラストの方法でラベル情報をより有効に活用するためにコントラストクロスエントロピー損失が改善された分類器ヘッドと、プロジェクターヘッドです。カテゴリ一貫性のある方法でデータの固有の構造を完全に活用するために、新しく設計されたカテゴリカル対照学習損失を使用します。包括的な実験により、Bi-tuningは、教師ありモデルと教師なし事前トレーニング済みモデルの両方のタスクを大幅に微調整するための最先端の結果を達成することが確認されています（たとえば、低データ領域でのCUBの精度が10.7％絶対的に向上）。

It is common within the deep learning community to first pre-train a deep neural network from a large-scale dataset and then fine-tune the pre-trained model to a specific downstream task. Recently, both supervised and unsupervised pre-training approaches to learning representations have achieved remarkable advances, which exploit the discriminative knowledge of labels and the intrinsic structure of data, respectively. It follows natural intuition that both discriminative knowledge and intrinsic structure of the downstream task can be useful for fine-tuning, however, existing fine-tuning methods mainly leverage the former and discard the latter. A question arises: How to fully explore the intrinsic structure of data for boosting fine-tuning? In this paper, we propose Bi-tuning, a general learning framework to fine-tuning both supervised and unsupervised pre-trained representations to downstream tasks. Bi-tuning generalizes the vanilla fine-tuning by integrating two heads upon the backbone of pre-trained representations: a classifier head with an improved contrastive cross-entropy loss to better leverage the label information in an instance-contrast way, and a projector head with a newly-designed categorical contrastive learning loss to fully exploit the intrinsic structure of data in a category-consistent way. Comprehensive experiments confirm that Bi-tuning achieves state-of-the-art results for fine-tuning tasks of both supervised and unsupervised pre-trained models by large margins (e.g. 10.7% absolute rise in accuracy on CUB in low-data regime).

updated: Thu Nov 12 2020 03:32:25 GMT+0000 (UTC)

published: Thu Nov 12 2020 03:32:25 GMT+0000 (UTC)

arXiv

参考文献 (このサイトで利用可能なもの) / References (only if available on this site)

被参照文献 (このサイトで利用可能なものを新しい順に) / Citations (only if available on this site, in order of most recent)

Amazon.co.jpアソシエイト